Google’s VLOGGER AI brings photos to life

By: Dale Arasa - @inquirerdotnet

INQUIRER.net / 11:37 AM March 21, 2024

Do you remember the funny portraits from the Harry Potter movies? Sometimes, Harry and the gang would speak with these paintings, which may offer clues to their problems.

Google recently created an AI program that can turn still images to life like those portraits: VLOGGER.

READ: Stanford AI helps robots move like humans

Article continues after this advertisement

The new artificial intelligence can generate audio and make picture speak and move in a short clip, opening new possibilities for this technology.

How does VLOGGER work?

"Google researchers unveil ‘VLOGGER’, an AI that can bring still photos to life"Thanks for making #socialengineering attacks accessible to everyone🙏https://t.co/lVmY5Ck2Jy
— HackManac (@H4ckManac) March 20, 2024

VentureBeat says the “AI model can take a photo of a person and an audio clip as input, and then output a video that matches the audio, showing the person speaking the words and making corresponding facial expressions, head movements, and hand gestures.”

Google researcher Enric Corona led a team to train an artificial intelligence model on a large video dataset to make these features possible.

They call the dataset MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video.

Article continues after this advertisement

This vast amount of information allows VLOGGER to generate videos of people of various ethnicities, ages, and other characteristics.

The researchers compiled their work in the paper titled, “VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis.”

“In contrast to previous work, our method does not require training for each person, does not rely on face detection and cropping, generates the complete image (not just the face or the lips), and considers a broad spectrum of scenarios (e.g. visible torso or diverse subject identities) that are critical to correctly synthesize humans who communicate,” the authors wrote.

Article continues after this advertisement

VentureBeat says this AI model might enable actors to license detailed 3D models of themselves for future performances.

The program may also automatically dub videos into other languages by swapping the audio track.

Perhaps the technology could help make photorealistic avatars for video games and virtual reality. Still, the program has limitations.

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Subscribe to our daily newsletter

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

For example, VLOGGER videos are short and have static backgrounds. Moreover, the subjects in the videos don’t occupy a 3D environment and display unnatural movements.

TOPICS: Google, technology

Google’s VLOGGER AI brings photos to life

How does VLOGGER work?

Disclaimer: Comments do not represent the views of INQUIRER.net. We reserve the right to exclude comments which are inconsistent with our editorial standards. FULL DISCLAIMER

© Copyright 1997-2026 INQUIRER.net | All Rights Reserved