Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Screen-level OS control#6 of 14 in category

Gemini 3.1 Pro

Google DeepMind · Launched Feb 2026

Google's most advanced model; leads agentic browsing (BrowseComp 85.9%) and scores OSWorld-Verified 76.2%.

5
Benchmarks scored
85.9
Peak score
0
Article mentions
No
Open source

Benchmark performance

BrowseComp

OpenAI's 1,266 hard browsing problems that reward research depth and factual grounding rather than shallow navigation.

85.9
Gap to SOTA: -1.0pp (held by Claude Mythos Preview)Human: 80%Benchmark docs →
SWE-Bench Verified

OpenAI-verified 500-issue subset of SWE-Bench. Approaching saturation in 2026 - most frontier models clear 80%+.

80.6
Gap to SOTA: -14.4pp (held by Claude Fable 5)Benchmark docs →
OSWorld-Verified

369 real tasks on a live Ubuntu desktop VM: file I/O, spreadsheets, creative apps, settings. The July 2025 Verified rebuild moved to AWS (50x parallel) and fixed 300+ task bugs. The flagship computer-use benchmark.

76.2
Gap to SOTA: -7.2pp (held by Claude Opus 4.8)Human: 72.4%Benchmark docs →
Terminal-Bench 2.1

Held-out, contamination-resistant CLI tasks driven end-to-end in a real terminal. Version 2.1 is the 2026 standard for terminal autonomy.

70.7
Gap to SOTA: -12.7pp (held by Codex CLI (GPT-5.5))Benchmark docs →
SWE-Bench Pro

Harder, contamination-resistant successor to SWE-Bench Verified: real GitHub issues with held-out tests. Where coding headroom remains.

54.2
Gap to SOTA: -15.0pp (held by Claude Opus 4.8)Benchmark docs →

Other screen-level os control agents

The 14 agents in this category, ranked by peak benchmark.

AgentMakerLaunchPeakPricing
Claude Sonnet 4.6Anthropic2026-021470.0$3 / $15 per M tokens
Claude Opus 4.8Anthropic2026-0588.6$5 / $25 per M tokens
UI-TARS-2OSSByteDance2025-0988.2Open weights
Claude Opus 4.7Anthropic2026-0387.6$5 / $25 per M tokens
Claude Mythos PreviewAnthropic2026-0486.9Research preview
GPT-5.5OpenAI2026-0582.6OpenAI API
Holo3-35B-A3BH Company2026-0482.6H Company
GUI-Owl-1.5OSSAlibaba2026-0280.3Open weights
Holo3-122B-A10BH Company2026-0478.8H Company
GPT-5.4OpenAI2026-0375.0OpenAI API

Quick facts

Type
Screen-level OS control
Maker
Google DeepMind
Launch
2026-02-19
Open source
No
Pricing
Google API
Benchmarks scored
5
Article mentions
0
Rank in category
#6 of 14