Google Gemini 2.0 prepares for a future of AI agents

By: Dale Arasa - 1 year ago

Google unveiled Gemini 2.0, its latest, most powerful AI model for the “agentic era.”

The search engine company also released an experimental version of Gemini 2.0 Flash. Developers can now build with this model in the Gemini API via Google AI Studio and Vertex AI.

READ: How to use the Google Gemini app

Moreover, Gemini 2.0 powers AI agent prototypes Project Astra, Project Mariner, and Jules.

What are Gemini 2.0’s features?

The Google official The Keyword website explains Gemini 2.0 will prepare the company for AI’s “agentic era.”

The new AI model can generate images and audio, exhibit enhanced reasoning and planning, and show real-world decision-making capabilities.

Gemini 2.0 Flash supports multimodal inputs and outputs. These include the ability to generate native images along with text and produce customizable text-to-speech multilingual audio.

As mentioned, the Gemini 2.0 release included the following experimental agents:

Project Astra is a universal AI assistant that demonstrates near-human conversational speech and integration with Google Search, Lens, and others.
Project Mariner can reason across text, images, and interactive elements, such as forms in a browser. As a result, it could complete end-to-end web tasks, potentially redefining web automation.
Jules is an AI assistant for developers. It integrates with GitHub to address coding issues and propose solutions, generate plans, and perform tasks autonomously.

READ: NASA jet takes you anywhere on Earth in two hours

Google DeepMind is also working with gaming companies like Supercell to create AI game agents.

These digital companions can interpret game actions, letting them suggest strategies to players in real-time.

Why are companies building AI agents?

We’re really grateful to Jan for everything he's done for OpenAI, and we know he'll continue to contribute to the mission from outside. In light of the questions his departure has raised, we wanted to explain a bit about how we think about our overall strategy.First, we have… https://t.co/djlcqEiLLN
— Greg Brockman (@gdb) May 18, 2024

In 2023, Inquirer Tech reported that AI models have scoured the entire Internet for training data.

This issue challenged Big Tech’s race to make bigger AI models. For example, many complained about OpenAI’s latest AI system for its disappointing improvements.

In response, the biggest tech firms realized they needed to take AI development in a new direction: AI agents.

AI agents can act autonomously to achieve open-ended, loosely defined goals.

As a result, they can make long-term plans and use tools like an internet browser. Also, they can try new methods whenever they receive new information.

OpenAI President Greg Brockman posted his and CEO Sam Altman’s statement on X regarding this long-term goal:

“This seems like a good of a time as any to talk about how we view the future,” they said.

“Users will increasingly interact with systems, composed of many multimodal models plus tools…”

“…which can take actions on their behalf, rather than talking to a single model.”

Google CEO Sundar Pichai shared a similar outlook in The Keyword’s Gemini 2.0 page:

“We have been investing in developing more agentic models, meaning they can understand more about the world around you…”

“…think multiple steps ahead, and take action on your behalf, with your supervision.”

READ: Google at Work shows how AI agents can help businesses

To illustrate Pichai’s vision for Gemini 2.0, here’s how an AI agent could help you book airline tickets:

It would review your email or calendar to know when and where you will travel. Moreover, it will remember your travel preferences, such as your preference for window seats.
Then, the agent will research and choose the best flight.
The digital assistant will retrieve your personal and payment information.
Afterward, it will use an internet browser to open the airline’s booking system and buy your tickets.

Learn about OpenAI’s competitor to Gemini 2.0 here.