OpenVoice AI clones voices accurately

By: Dale Arasa - 2 years ago

The Canadian startup MyShell collaborated with MIT and Tsinghua University researchers to create OpenVoice AI. The tool can clone voices and allow granular control over tone, emotion, accent, rhythm, etc. Moreover, it could recite phrases and emotions in different languages, opening numerous potential applications.

For example, you could have a high-quality AI voiceover for your YouTube videos instead of paying for a professional. However, advanced voice cloning technology may lead to more nefarious purposes, such as scams. That is why we must learn more about these tools as they become more widely available.

This article will elaborate on the many features of OpenVoice AI. Later, I will discuss a similar tool from Microsoft and Cornell University to illustrate how far we’ve been exploring this technology.

How does OpenVoice AI work?

Today, we proudly open source our OpenVoice algorithm, embracing our core ethos – AI for all.Experience it now: https://t.co/zHJpeVpX3t. Clone voices with unparalleled precision, with granular control of tone, from emotion to accent, rhythm, pauses, and intonation, using just a… pic.twitter.com/RwmYajpxOt

— MyShell (@myshell_ai) January 2, 2024

MyShell explains on its Github page that its OpenVoice AI has three advantages over similar tools. First, it has Accurate Tone Color Cloning, enabling it to clone the reference tone color and generate speech in multiple languages and accents.

Second, Flexible Voice Style Control lets users modify specific characteristics of a voice sample, such as emotion, accent, pauses, and intonation. Third, Zero-shot Cross-lingual Voice Cloning lets OpenVoice generate speech in languages not included in its multi-lingual training dataset.

You may also like: AI city planning to revolutionize urban spaces

Learn about the capabilities of this artificial intelligence on the MyShell research website. You can hear how the tool can make a voice sample recite various lines.

It can also generate recitations that convey sadness, happiness, and other emotions. Moreover, the program can adjust voices to talk in different accents, like British and Indian.

Similar programs can only generate messages in English and a few other languages. However, OpenVoice AI can integrate several languages into a single passage.

Other vocal AI programs

Microsoft has also developed a similar tool called VALL-E, which creates personalized speech from text and acoustic prompts with its Neural Codec Language Modeling. In other words, it can generate voice messages from text descriptions and three-second voice recordings.

This feature enables VALL-E to make statements nearly indistinguishable from a real person’s voice, similar to OpenVoice AI. Its researchers say it could “preserve the speaker’s emotion and acoustic environment of the acoustic in synthesis.”

This Microsoft Program can also add ambient noise to enhance realism, setting its results apart from rudimentary text-to-speech tools. Moreover, a Cornell University student took AI voice recognition to the next level by integrating it into a pair of glasses.

Doctoral student Ruidong Zhang calls his device the EchoSpeech. The current version enables users to communicate with others via smartphone. It uses an AI-powered sonar system to read the user’s lips.

Sonar involves sending sound waves to bounce them against surrounding objects. Then, the transmitter receives the returning sound to map the environment.

It’s similar to how bats navigate in pitch darkness despite having poor eyesight. Zhang’s AI glasses use sonar to determine a user’s mouth shapes and movements as they speak.

The device matches those sound waves with a Smart Computer Interfaces for Future Interactions (SciFi) Lab algorithm. This artificial intelligence analyzes echo profiles with 95% accuracy.

The EchoSpeech sets itself apart from similar gadgets due to this sonar method. It provides more accurate speech recognition, making them more useful.

Also, these AI glasses look inconspicuous due to their tiny microphones and speakers. As a result, the EchoSpeech is more practical for daily use.

The current version lets people communicate by projecting their words on a smartphone. As a result, they may speak in inconvenient environments, like a noisy street or quiet library.

Conclusion

MIT, Tsinghua University, and the startup MyShell created an AI voice cloning program that closely imitates human speech. You could make your voice sample speak several passages in different languages.

OpenVoice AI replicates subtle characteristics like intonation and pauses to deliver excellent accuracy. Also, it can handle sentences containing different languages.

Learn more about this voice cloning artificial intelligence on its arXiv webpage. Check out other digital tips and trends at Inquirer Tech.