Screen-level OS control#2 of 14 in category

Claude Opus 4.8

Anthropic · Launched May 2026

Anthropic's May 2026 flagship and the current OSWorld-Verified leader (83.4%). Also tops SWE-bench Pro (69.2%); powers Claude Computer Use and Claude Code.

Visit Claude Opus 4.8 →$5 / $25 per M tokens

Benchmarks scored

88.6

Peak score

Article mentions

Open source

Benchmark performance

SWE-Bench Verified

OpenAI-verified 500-issue subset of SWE-Bench. Approaching saturation in 2026 - most frontier models clear 80%+.

88.6

Gap to SOTA: -6.4pp (held by Claude Fable 5)Benchmark docs →

OSWorld-Verified

369 real tasks on a live Ubuntu desktop VM: file I/O, spreadsheets, creative apps, settings. The July 2025 Verified rebuild moved to AWS (50x parallel) and fixed 300+ task bugs. The flagship computer-use benchmark.

83.4

★ SOTA

Human: 72.4%Benchmark docs →

Terminal-Bench 2.1

Held-out, contamination-resistant CLI tasks driven end-to-end in a real terminal. Version 2.1 is the 2026 standard for terminal autonomy.

78.9

Gap to SOTA: -4.5pp (held by Codex CLI (GPT-5.5))Benchmark docs →

SWE-Bench Pro

Harder, contamination-resistant successor to SWE-Bench Verified: real GitHub issues with held-out tests. Where coding headroom remains.

69.2

★ SOTA

Benchmark docs →

Other screen-level os control agents

The 14 agents in this category, ranked by peak benchmark.

Agent	Maker	Launch	Peak	Pricing
Claude Sonnet 4.6	Anthropic	2026-02	1470.0	$3 / $15 per M tokens
UI-TARS-2OSS	ByteDance	2025-09	88.2	Open weights
Claude Opus 4.7	Anthropic	2026-03	87.6	$5 / $25 per M tokens
Claude Mythos Preview	Anthropic	2026-04	86.9	Research preview
Gemini 3.1 Pro	Google DeepMind	2026-02	85.9	Google API
GPT-5.5	OpenAI	2026-05	82.6	OpenAI API
Holo3-35B-A3B	H Company	2026-04	82.6	H Company
GUI-Owl-1.5OSS	Alibaba	2026-02	80.3	Open weights
Holo3-122B-A10B	H Company	2026-04	78.8	H Company
GPT-5.4	OpenAI	2026-03	75.0	OpenAI API

Quick facts

Type: Screen-level OS control
Maker: Anthropic
Launch: 2026-05-28
Open source: No
Pricing: $5 / $25 per M tokens
Benchmarks scored: 4
Article mentions: 0
Rank in category: #2 of 14