Google’s VLOGGER AI brings photos to life

Do you remember the funny portraits from the Harry Potter movies? Sometimes, Harry and the gang would speak with these paintings, which may offer clues to their problems. 

Google recently created an AI program that can turn still images to life like those portraits: VLOGGER.

READ: Stanford AI helps robots move like humans

The new artificial intelligence can generate audio and make picture speak and move in a short clip, opening new possibilities for this technology.

How does VLOGGER work?

VentureBeat says the “AI model can take a photo of a person and an audio clip as input, and then output a video that matches the audio, showing the person speaking the words and making corresponding facial expressions, head movements, and hand gestures.”

Google researcher Enric Corona led a team to train an artificial intelligence model on a large video dataset to make these features possible.

They call the dataset MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video. 

This vast amount of information allows VLOGGER to generate videos of people of various ethnicities, ages, and other characteristics. 

The researchers compiled their work in the paper titled, “VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis.”

“In contrast to previous work, our method does not require training for each person, does not rely on face detection and cropping, generates the complete image (not just the face or the lips), and considers a broad spectrum of scenarios (e.g. visible torso or diverse subject identities) that are critical to correctly synthesize humans who communicate,” the authors wrote.

VentureBeat says this AI model might enable actors to license detailed 3D models of themselves for future performances. 

The program may also automatically dub videos into other languages by swapping the audio track. 

Perhaps the technology could help make photorealistic avatars for video games and virtual reality. Still, the program has limitations.

For example, VLOGGER videos are short and have static backgrounds. Moreover, the subjects in the videos don’t occupy a 3D environment and display unnatural movements.

Read more...