After Anthropic, Google now lets you build AI agents that control your computer with Gemini 3.5 Flash: Here is how

By Siddharth Chauhan | Updated on 25-Jun-2026

Add DIGIT as a preferred source

HIGHLIGHTS

Computer Use is now a native, built-in capability in Gemini 3.5 Flash

Chrome 149 adds a "Select from screen" tool that lets users drag a box over anything on a webpage

Google has added adversarial training and two enterprise safety systems

After Anthropic, Google now lets you build AI agents that control your computer with Gemini 3.5 Flash: Here is how

Siddharth Chauhan

25-Jun-2026

Google has built Computer Use directly into Gemini 3.5 Flash, following a similar capability Anthropic introduced for Claude. Computer Use in Gemini 3.5 Flash is available now through the Gemini API and the Gemini Enterprise Agent Platform. The feature lets AI agents interact with software the way a person does: by looking at a screen and deciding where to click, what to type and when to scroll, rather than relying on rigid, pre-coded integrations with each individual app. Previously, this capability existed only as a separate, standalone model built on Gemini 2.5. With this update, it is now part of the main Gemini 3.5 Flash model itself and is available to developers through the Gemini API and the Gemini Enterprise Agent Platform.

Survey

✅ Thank you for completing the survey!

How it works

The practical use case of computer use in Gemini and Claude is automating tasks that would otherwise require a person sitting at a desktop, navigating websites, filling out forms, clicking through multi-step workflows or pulling data from systems that don’t offer a clean API to plug into. Because the model works visually, agents built on it can operate across browser, mobile and desktop environments without needing custom code written for each one. Google is positioning this primarily for long-running enterprise tasks, such as continuous software testing or repetitive knowledge work across professional applications. To help developers get started quickly, Google has set up a live demo environment hosted by Browserbase where the capability can be tested directly.

Alongside the developer-facing update, Chrome 149 introduces a feature called Select from screen, found in the browser’s attachment menu. Clicking it highlights the active tab and lets you drag a selection box over any image or block of text on the page, dropping it straight into a Gemini prompt. It’s a small convenience, but it removes the friction of switching tabs or taking a screenshot just to ask Gemini a question about something you’re looking at.

How safe is Computer Use

Letting an AI agent control a mouse and keyboard raises an immediate question: what happens if it lands on a malicious page with hidden instructions designed to hijack its behaviour, a risk known as indirect prompt injection. Google says it used targeted adversarial training specifically to harden Gemini 3.5 Flash’s Computer Use capability against this kind of attack. On top of that, it has introduced two optional enterprise safeguards: one that requires explicit human confirmation before the agent takes any sensitive or irreversible action and another that automatically halts a task the moment it detects a prompt injection attempt. Google is also recommending developers pair these features with sandboxing, human-in-the-loop verification and strict access controls rather than relying on the model’s built-in protections alone.

Siddharth Chauhan

Siddharth reports on gadgets, technology and you will occasionally find him testing the latest smartphones at Digit. However, his love affair with tech and futurism extends way beyond, at the intersection of technology and culture. View Full Profile