Google baked Computer Use directly into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 (78.7). The model natively sees and operates screens, browsers, and mobile devices via the Gemini API.
Key facts
- Gemini 3.5 Flash scores 78.4 on OSWorld.
- GPT-5.5 leads at 78.7; Anthropic Opus 4.8 at 83.4.
- Feature available via Gemini API and Enterprise Agent Platform.
- Includes adversarial training for prompt injection defense.
- Previously only available as separate Gemini 2.5 model.
Google has integrated "Computer Use" directly into Gemini 3.5 Flash, allowing the model to see, understand, and interact with computers, browsers, and mobile devices autonomously. Previously, this capability was only available as a separate Gemini 2.5 model. Combined with existing tools like function calls, Search, and Maps, developers can now build agents for software testing or office automation across browser, mobile, and desktop environments According to The Decoder.
On the OSWorld benchmark, Gemini 3.5 Flash scores 78.4, beating Gemini 3 Flash (65.1) and GPT-5.4 mini (72.1). GPT-5.5 sits just ahead at 78.7, while Anthropic's Opus 4.8 leads at 83.4. Sonnet 4.6 also hits 78.4, and Gemini 3.1 Pro lands at 76.2. The benchmark measures an agent's ability to complete real-world computer tasks like file manipulation and web navigation.
Security and Deployment
To guard against prompt injection attacks, Google uses adversarial training and two optional enterprise safeguards. One requires user confirmation for sensitive or irreversible actions, while the other automatically stops tasks when it detects indirect prompt injections. Google also recommends sandboxing, human oversight, and strict access controls, with more details in its best practices documentation. The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. A Browserbase demo and a GitHub reference implementation are also available.
The move follows Google's broader push to embed agentic capabilities directly into its models rather than requiring separate orchestration layers — a pattern also seen in OpenAI's GPT-5.5 and Anthropic's Claude Opus. By folding Computer Use into the cheaper Flash tier, Google undercuts competitors on price while narrowing the gap on agentic benchmarks.
What to watch
Watch for enterprise adoption metrics in Google Cloud's next quarterly earnings, and whether Anthropic or OpenAI respond with lower-tier models matching Flash's price-performance on OSWorld. A direct head-to-head with GPT-5.5-mini would clarify the agentic cost curve.
Source: the-decoder.com








