Google has introduced Gemini 2.5 Computer Use, a new AI model designed to interact directly with web and mobile interfaces. This model, built on Gemini 2.5 Pro’s visual understanding and reasoning capabilities, allows AI agents to perform tasks on computers by clicking, typing, scrolling, and navigating websites, much like a human would.
Survey
✅ Thank you for completing the survey!
The Gemini 2.5 Computer Use model is available through the Gemini API in Google AI Studio and Vertex AI. Keep reading to know how this new AI model works.
The Gemini 2.5 Computer Use model core capabilities are “exposed through the new `computer_use` tool in the Gemini API and should be operated within a loop,” the tech giant explains in a blogpost. Inputs to the tool include the user’s request, a screenshot of the current environment, and a history of recent actions. You can also choose to exclude certain UI actions or add custom ones.
The model then analyses these inputs and responds with a function call, such as clicking or typing. Some actions, like making a purchase, may require user confirmation. The client-side code then executes the action, and sends back a new screenshot and the current URL. This restarts the loop. This iterative process continues until the task is finished, an error occurs, or a safety/user stop is triggered.
“The Gemini 2.5 Computer Use model is primarily optimised for web browsers, but also demonstrates strong promise for mobile UI control tasks. It is not yet optimised for desktop OS-level control,” Google said.
According to the tech giant, the Gemini 2.5 Computer Use model offers leading quality for browser control at the lowest latency, as measured by performance on the Browserbase harness for Online-Mind2Web.
Ayushi works as Chief Copy Editor at Digit, covering everything from breaking tech news to in-depth smartphone reviews. Prior to Digit, she was part of the editorial team at IANS. View Full Profile