Microsoft VASA-1 makes faces talk and sing realistically

Have you ever watched the noontime show, “Eat Bulaga?” You’ll notice that it uses human portraits made to talk like family members. 

The gag is that they sound like awkward robotic voices with exaggerated American accents.

READ: Scientists film plants “talking” to each other

Recent artificial intelligence developments have transcended these caricatures. Microsoft VASA-1 is the latest example. 

The major tech firm announced it created an artificial intelligence model that can make faces articulate and speak clearly.

What is Microsoft VASA-1?

The Bill Gates co-founded company announced VASA, an AI framework for generating “talking faces of virtual characters” from a single picture and a speech audio clip. 

Microsoft calls the first model VASA-1. It can produce lip movements that synchronize closely with sound clips. 

Moreover, the AI model captures numerous facial nuances and natural head motions that make them convincingly lifelike.

The company said VASA-1 came from core innovations like a holistic facial dynamics and head movement generation model that works in a face latent space. 

It also involved numerous video samples to create “expressive and disentangled face latent space using videos. As a result, VASA clips exhibit the following characteristics: 

Microsoft reminds the public it used virtual, non-existing identities made by AI programs DALL-E-3 and StyleGAN2, except for the Mona Lisa sample. 

These portraits do not impersonate any people in the real world. Microsoft intended these limitations because it understands the possibility of misuse.

The company stated, “We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.” 

Read more...