Microsoft AI Voice Tool Mimics Voices From Three-Second Clips

By: Dale Arasa - @inquirerdotnet

04:09 PM January 10, 2023

People have been clamoring about AI-generated text and art. Now, we should look out for the next step of artificial intelligence: Microsoft AI.

The tech giant announced its latest AI creation named VALL-E. It can say anything in your voice based on a three-second sound bite.

At the time of writing, it is not available for public use. Eventually, it would become a mainstream tool and improve, similar to ChatGPT and DALL-E.

How does the Microsoft AI VALL-E work?

This represents the latest Microsoft AI Voice tool

Photo Credit: www.smartsheet.com

Microsoft published a Github report discussing VALL-E in layman’s terms. The overview says it creates relies on a text and acoustic prompt.

The former indicates what someone is saying in text form. Meanwhile, the latter is another term for a three-second recording of their voice.

Then, VALL-E uses Neural Codec Language Modeling to turn them into personalized speech. It combines the user’s preferences and machine learning.

As a result, the new Microsoft AI provides voice statements that are nearly indistinguishable from a real person’s voice.

Article continues after this advertisement

Moreover, the VALL-E researchers say it could “preserve the speaker’s emotion and acoustic environment of the acoustic in synthesis.”

Article continues after this advertisement

In other words, the samples could have ambient noise, further improving their realism.

Other text-to-speech tools have an unrealistic cadence and eerily absent background noise when “speaking.”

Article continues after this advertisement

What is the potential real-world impact of VALL-E?

This represents the potential uses of VALL-E.

Photo Credit: builtin.com

Some people dread the day when artificial intelligence can speak like humans. After all, we have seen many issues with AI-generated content.

Also, the internet has many deepfake clips featuring photos of prominent figures singing or dictating a silly quote.

Of course, we merely laugh at them because they are fake. Imagine if those deepfakes could closely replicate a politician’s or celebrity’s voice.

They could potentially interrupt governments and businesses. After all, the recent Twitter debacle showed the world the potential damage from a single tweet.

For example, pharmaceutical company Eli Lilly’s stock went down coincidentally after a fake account tweeted it would offer insulin for free.

If Microsoft AI falls into the wrong hands, telemarketers could use it to scam more people with automated realistic phone calls.

On the other hand, it could become a boon for people who need to promote their products and services online.

A business owner could use VALL-E to produce a promotional blurb. Then, they could overlay the clip into their online ad.

Conclusion

The latest Microsoft AI claims it can mimic your voice using a three-second clip. The results are allegedly so realistic that they could include ambient noise.

At the time of writing, the company does not allow the public to use VALL-E. Still, it could become widely available soon, similar to AI-generated art and text.

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Subscribe to our daily newsletter

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

In response, you must adapt using the latest digital news and updates. Fortunately, you can start by following Inquirer Tech.

TOPICS: AI, Artificial Intelligence, Microsoft, Trending

Microsoft AI Voice Tool Mimics Voices From Three-Second Clips

How does the Microsoft AI VALL-E work?

What is the potential real-world impact of VALL-E?

Conclusion

Disclaimer: Comments do not represent the views of INQUIRER.net. We reserve the right to exclude comments which are inconsistent with our editorial standards. FULL DISCLAIMER

© Copyright 1997-2025 INQUIRER.net | All Rights Reserved