Screen-level OS control#1 of 3 in category

Claude Sonnet 4.6

Anthropic · Launched Feb 2026

Anthropic's fast mid-tier model; sits right on the human OSWorld-Verified baseline at 72.1%.

Benchmarks scored

1470.0

Peak score

Article mentions

Open source

Benchmark performance

1470.0

85.0

79.6

72.1

The 3 agents in this category, ranked by peak benchmark.

Agent	Maker	Launch	Peak	Pricing
Claude Mythos Preview	Anthropic	2026-04	86.9	Research preview
Claude Sonnet 4.5	Anthropic	2025-09	62.9	Legacy Anthropic API

2026-06-30

Claude Sonnet 5 Beats Opus 4.8 on Knowledge Work at Lower Cost

2026-06-15

Anthropic Blocks Claude from Outputting GPL, Apache, 7 Other Licenses

2026-06-08

Anthropic: AI agents fail biology retrieval, miss 261 Ebola sequences

2026-06-04

Ontology-Grounded AI Agent Testing Hits 48.3% Regulatory Coverage vs.

2026-05-14

Anthropic Deprecates Fixed Thinking Budgets, Forces Adaptive Mode

2026-04-23

3 Ways to Switch Claude Code Models Instantly: /model, --flag, and ENV Variables

2026-04-17

Navox Agents: 8 Specialized Claude Code Agents with Human Checkpoints

2026-04-16

Claude Code's Edge: Why Sonnet 4.5 Beats GPT-4o for Multi-File Projects