Google introduces Gemini 2.5 Computer Use AI model: Here’s how it works

HIGHLIGHTS

Google has introduced the Gemini 2.5 Computer Use model.

The Gemini 2.5 Computer Use model is available through the Gemini API in Google AI Studio and Vertex AI.

It is built on Gemini 2.5 Pro's visual understanding and reasoning capabilities.

Google introduces Gemini 2.5 Computer Use AI model: Here’s how it works

Google has introduced Gemini 2.5 Computer Use, a new AI model designed to interact directly with web and mobile interfaces. This model, built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities, allows AI agents to perform tasks on computers by clicking, typing, scrolling, and navigating websites, much like a human would. 

Digit.in Survey
✅ Thank you for completing the survey!

The Gemini 2.5 Computer Use model is available through the Gemini API in Google AI Studio and Vertex AI. Keep reading to know how this new AI model works.

Also read: UPI goes PIN-free with on-device biometric authentication, gets other features

How it works

The Gemini 2.5 Computer Use model core capabilities are “exposed through the new `computer_use` tool in the Gemini API and should be operated within a loop,” the tech giant explains in a blogpost. Inputs to the tool include the user’s request, a screenshot of the current environment, and a history of recent actions. You can also choose to exclude certain UI actions or add custom ones.

The model then analyses these inputs and responds with a function call, such as clicking or typing. Some actions, like making a purchase, may require user confirmation. The client-side code then executes the action, and sends back a new screenshot and the current URL. This restarts the loop. This iterative process continues until the task is finished, an error occurs, or a safety/user stop is triggered.

Also read: Google will pay you over Rs 26 lakh for finding bugs in its AI products

“The Gemini 2.5 Computer Use model is primarily optimised for web browsers, but also demonstrates strong promise for mobile UI control tasks. It is not yet optimised for desktop OS-level control,” Google said.

According to the tech giant, the Gemini 2.5 Computer Use model offers leading quality for browser control at the lowest latency, as measured by performance on the Browserbase harness for Online-Mind2Web.

Also read: Samsung Galaxy A55 5G price drops by Rs 16,000 during Amazon Great Indian Festival sale

Ayushi Jain

Ayushi Jain

Tech news writer by day, BGMI player by night. Combining my passion for tech and gaming to bring you the latest in both worlds. View Full Profile

Digit.in
Logo
Digit.in
Logo