Google’s VLOGGER AI brings photos to life | Inquirer Technology

Google’s VLOGGER AI brings photos to life

/ 11:37 AM March 21, 2024

Do you remember the funny portraits from the Harry Potter movies? Sometimes, Harry and the gang would speak with these paintings, which may offer clues to their problems. 

Google recently created an AI program that can turn still images to life like those portraits: VLOGGER.

READ: Stanford AI helps robots move like humans

Article continues after this advertisement

The new artificial intelligence can generate audio and make picture speak and move in a short clip, opening new possibilities for this technology.

FEATURED STORIES

How does VLOGGER work?

VentureBeat says the “AI model can take a photo of a person and an audio clip as input, and then output a video that matches the audio, showing the person speaking the words and making corresponding facial expressions, head movements, and hand gestures.”

Google researcher Enric Corona led a team to train an artificial intelligence model on a large video dataset to make these features possible.

Article continues after this advertisement

They call the dataset MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video. 

Article continues after this advertisement

This vast amount of information allows VLOGGER to generate videos of people of various ethnicities, ages, and other characteristics. 

Article continues after this advertisement

The researchers compiled their work in the paper titled, “VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis.”

“In contrast to previous work, our method does not require training for each person, does not rely on face detection and cropping, generates the complete image (not just the face or the lips), and considers a broad spectrum of scenarios (e.g. visible torso or diverse subject identities) that are critical to correctly synthesize humans who communicate,” the authors wrote.

Article continues after this advertisement

VentureBeat says this AI model might enable actors to license detailed 3D models of themselves for future performances. 

The program may also automatically dub videos into other languages by swapping the audio track. 

Perhaps the technology could help make photorealistic avatars for video games and virtual reality. Still, the program has limitations.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to our daily newsletter

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

For example, VLOGGER videos are short and have static backgrounds. Moreover, the subjects in the videos don’t occupy a 3D environment and display unnatural movements.

TOPICS: Google, technology
TAGS: Google, technology

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to our newsletter!

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

© Copyright 1997-2024 INQUIRER.net | All Rights Reserved

This is an information message

We use cookies to enhance your experience. By continuing, you agree to our use of cookies. Learn more here.