Google AI Introduces Gemini 2.5 ‘Computer Use’ (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces

Which of your browser workflows would you delegate at the moment if an agent may plan and execute predefined UI actions? Google AI introduces Gemini 2.5 Computer Use, a specialised variant of Gemini 2.5 that plans and executes actual UI actions in a dwell browser through a constrained motion API. It’s accessible in public preview by way of Google AI Studio and Vertex AI. The mannequin targets internet automation and UI testing, with documented, human-judged beneficial properties on normal internet/cellular management benchmarks and a security layer that may require human affirmation for dangerous steps.
What the mannequin really ships?
Developers name a brand new computer_use
software that returns perform calls like click_at
, type_text_at
, or drag_and_drop
. Client code executes the motion (e.g., Playwright/Browserbase), captures a recent screenshot/URL, and loops till the duty ends or a security rule blocks it. The supported motion house is 13 predefined UI actions—open_web_browser
, wait_5_seconds
, go_back
, go_forward
, search
, navigate
, click_at
, hover_at
, type_text_at
, key_combination
, scroll_document
, scroll_at
, drag_and_drop
—and might be prolonged with customized features (e.g., open_app
, long_press_at
, go_home
) for non-browser surfaces.

What is the scope and constraints?
The mannequin is optimized for internet browsers. Google states it’s not but optimized for desktop OS-level management; cellular situations work by swapping in customized actions beneath the identical loop. A built-in security monitor can block prohibited actions or require person affirmation earlier than “high-stakes” operations (funds, sending messages, accessing delicate data).
Measured efficiency
- Online-Mind2Web (official): 69.0% go@1 (majority-vote human judgments), validated by benchmark organizers.
- Browserbase matched harness: Leads competing computer-use APIs on each accuracy and latency throughout Online-Mind2Web and WebVoyager beneath equivalent time/step/surroundings constraints. Google’s mannequin card lists 65.7% (OM2W) and 79.9% (WebVoyager) in Browserbase runs.
- Latency/high quality trade-off (Google determine): ~70%+ accuracy at ~225 s median latency on the Browserbase OM2W harness. Treat as Google-reported, with human analysis.
- AndroidWorld (cellular generalization): 69.7% measured by Google; achieved through the identical API loop with customized cellular actions and excluded browser actions.

Early manufacturing alerts
- Automated UI check restore: Google’s funds platform crew experiences the mannequin rehabilitates >60% of beforehand failing automated UI check executions. This is attributed (and needs to be cited) to public reporting quite than the core weblog submit.
- Operational velocity: Poke.com (early exterior tester) experiences workflows typically ~50% sooner versus their next-best various.
Editorial Comments
Gemini 2.5 Computer Use is in public preview through Google AI Studio and Vertex AI; it exposes a constrained API with 13 documented UI actions and requires a client-side executor. Google’s supplies and the mannequin card report state-of-the-art outcomes on internet/cellular management benchmarks, and Browserbase’s matched harness reveals ~65.7% go@1 on Online-Mind2Web with main latency beneath equivalent constraints. The scope is browser-first with per-step security/affirmation. These information factors justify measured analysis in UI testing and internet ops.
Check out the GitHub Page and Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter.
The submit Google AI Introduces Gemini 2.5 ‘Computer Use’ (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces appeared first on MarkTechPost.