How does Gemini 3.5 Flash Computer Use work?

The model natively sees and interacts with screens, browsers, and mobile devices via the Gemini API, combining vision with tool calls to execute tasks like software testing.

How does Gemini 3.5 Flash compare to GPT-5.5 on OSWorld?

Gemini 3.5 Flash scores 78.4, just 0.3 points behind GPT-5.5 at 78.7, and beats GPT-5.4 mini (72.1) and Gemini 3 Flash (65.1).

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Google Gemini AI model interface on a laptop screen, showing a browser window with code and graphical elements, with…

Products & LaunchesBreakthroughScore: 100

Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5

Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.

AAAla SMITH & AI Research Desk·10h ago·2 min read··12 views·AI-Generated·Report error

Source: the-decoder.comvia the_decoderMulti-Source

What is Gemini 3.5 Flash's OSWorld benchmark score with Computer Use?

Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 (78.7) and trailing Anthropic Opus 4.8 (83.4). The model natively operates screens, browsers, and mobile devices via the Gemini API.

TL;DR

Google bakes Computer Use into Gemini 3.5 Flash. · Scores 78.4 on OSWorld, tied with GPT-5.5. · Includes adversarial training against prompt injection attacks.

Google baked Computer Use directly into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 (78.7). The model natively sees and operates screens, browsers, and mobile devices via the Gemini API.

Key facts

Gemini 3.5 Flash scores 78.4 on OSWorld.
GPT-5.5 leads at 78.7; Anthropic Opus 4.8 at 83.4.
Feature available via Gemini API and Enterprise Agent Platform.
Includes adversarial training for prompt injection defense.
Previously only available as separate Gemini 2.5 model.

Google has integrated "Computer Use" directly into Gemini 3.5 Flash, allowing the model to see, understand, and interact with computers, browsers, and mobile devices autonomously. Previously, this capability was only available as a separate Gemini 2.5 model. Combined with existing tools like function calls, Search, and Maps, developers can now build agents for software testing or office automation across browser, mobile, and desktop environments According to The Decoder.

On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1). GPT-5.5 sits just ahead at 78.7, while Anthropic's Opus 4.8 leads at 83.4. Sonnet 4.6 also hits 78.4, and Gemini 3.1 Pro lands at 76.2. The benchmark measures an agent's ability to complete real-world computer tasks like file manipulation and web navigation.

Security and Deployment

To guard against prompt injection attacks, Google uses adversarial training and two optional enterprise safeguards. One requires user confirmation for sensitive or irreversible actions, while the other automatically stops tasks when it detects indirect prompt injections. Google also recommends sandboxing, human oversight, and strict access controls, with more details in its best practices documentation. The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. A Browserbase demo and a GitHub reference implementation are also available.

The move follows Google's broader push to embed agentic capabilities directly into its models rather than requiring separate orchestration layers — a pattern also seen in OpenAI's GPT-5.5 and Anthropic's Claude Opus. By folding Computer Use into the cheaper Flash tier, Google undercuts competitors on price while narrowing the gap on agentic benchmarks.

What to watch

Watch for enterprise adoption metrics in Google Cloud's next quarterly earnings, and whether Anthropic or OpenAI respond with lower-tier models matching Flash's price-performance on OSWorld. A direct head-to-head with GPT-5.5-mini would clarify the agentic cost curve.

Source: the-decoder.com

Sources cited in this article

The Decoder
Flash

Source: gentic.news · 10h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The integration of Computer Use into Gemini 3.5 Flash marks a structural shift in how Google packages agentic capability. By baking screen control into the base model rather than offering it as a separate fine-tune, Google reduces latency and complexity for developers — a direct response to Anthropic's Claude Opus and OpenAI's GPT-5.5 agent features. The 78.4 OSWorld score is telling: it's within 0.3 points of GPT-5.5, but Flash is Google's low-cost tier, meaning the price-to-performance ratio likely favors Google for high-volume agentic workflows. Notably, the gap between top models has collapsed. The spread from Gemini 3 Flash (65.1) to Opus 4.8 (83.4) is just 18.3 points, with four models clustered within 2.3 points of each other. This suggests OSWorld may be nearing saturation for current architectures, or that the benchmark rewards similar design choices across labs. Google's adversarial training against prompt injection is a practical differentiator, as computer-use agents are uniquely vulnerable to indirect attacks via screen content. Google's timing is aggressive: the company committed $11B/year to SpaceX compute and $14B to Anthropic in the same month, signaling a dual strategy of building in-house while betting on competitors. The Flash-tier Computer Use could cannibalize demand for Anthropic's Opus if enterprise customers prioritize cost over the 5-point OSWorld gap.

#agents #benchmarks #ai models #google

Compare side-by-side

Anthropic vs Google

→

Mentioned in this article

Gemini 3 Flash Google Computer Use GPT-5.5 OS-World Anthropic Opus 4.8 Anthropic Gemini API Gemini Enterprise Agent Platform Gemini GPT-5 Gemini 2.5

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Big Tech3 shared topics

Google DeepMind loses its third senior AI researcher in months as Nobel laureate John Jumper joins Anthropic

Products & Launches3 shared topics

Nature Study: Every Major AI Model Can Be Manipulated Into Academic Fraud

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5

Security and Deployment

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Google DeepMind loses its third senior AI researcher in months as Nobel laureate John Jumper joins Anthropic

ChatGPT Market Share Dips Below 50% for First Time, Sensor Tower Reports

Google Gemini-SQL2 Hits 80.04% on BIRD, Beating GPT-5.5 by 7 Points

Gemini 3.5 Live Translate Debuts as Real-Time Audio Model

Google Breaks Ground on $15B India Data Center Project

Nature Study: Every Major AI Model Can Be Manipulated Into Academic Fraud

The framework underneath this story

More in Products & Launches

Anthropic Launches Claude Tag as Multiplayer Slack Agent Ahead of IPO

Five Eyes Warns Frontier AI Could Reshape Cyber Warfare in Months