Claude Sonnet 4.6
Anthropic's fast mid-tier model; sits right on the human OSWorld-Verified baseline at 72.1%.
Anthropic's Claude Sonnet 4.6 sits exactly at the human OSWorld-Verified baseline of 72.1%, a notable benchmark achievement. Developed by Anthropic, it deploys Chain-of-Thought Prompting and Constitutional AI. However, the model's deployment velocity is tepid—only 3 mentions in the last 30 days. Recent news reveals a critical weakness: Anthropic's own research shows AI agents, presumably including Sonnet 4.6, failed to retrieve 261 Ebola sequences in a biology retrieval task. The model is used by King's College London, Navox Agents, and Claude Code, but faces pressure from newer adaptive thinking budgets (deprecated fixed budgets as of May 2026). The question is whether Sonnet 4.6 can maintain its baseline parity as competitors push beyond human-level performance.
- ·Scores 72.1% on OSWorld-Verified, matching the human baseline.
- ·Deploys Chain-of-Thought Prompting and Constitutional AI.
- ·Recent research reveals failure in biology retrieval (missed 261 Ebola sequences).
- ·Low mention velocity: 3 mentions in 30 days.
- ·Used by King's College London, Navox Agents, and Claude Code.
Signal Radar
Five-axis snapshot of this entity's footprint
Mentions × Lab Attention
Weekly mentions (solid) and average article relevance (dotted)
Timeline
7- Research MilestoneApr 16, 2026
Outperformed GPT-4o in real-world tests on multi-file development tasks
View source - Research MilestoneApr 11, 2026
Independent benchmarks validate Claude Sonnet 4.6 as a top-tier model for complex reasoning and coding tasks.
View source - Research MilestoneApr 6, 2026
Showed only 3.7% self-preservation bias in a study testing AI deception, the lowest among prominent models tested.
View source - Research MilestoneMar 26, 2026
Used in prompt compression study analyzing 358 successful runs from 1,199 real orchestration instructions
View source- runs analyzed:
- 358
- total instructions:
- 1199
- Product LaunchMar 20, 2026
Anthropic released Claude Sonnet 4.6 with native chain-of-thought reasoning mode for complex coding tasks
- Product LaunchMar 17, 2026
Service disruption with elevated error rates reported on status page
View source
Relationships
6Developed
Uses
Deploys
Frequently appears with
9Entities that show up in the same articles — shared coverage, not a stated relationship.
Recent Articles
2Anthropic: AI agents fail biology retrieval, miss 261 Ebola sequences
-Anthropic research shows Claude Sonnet 4 returning 5–106 Ebola sequences instead of 266, shifting outbreak origin from 2014 to 1922. Repeatable retrie
87 relevanceOntology-Grounded AI Agent Testing Hits 48.3% Regulatory Coverage vs.
~Ontology-grounded AI agent testing achieves 48.3% regulatory coverage vs. 33.1% baseline in 1800-scenario pilot. Coverage advantage over RAG not robus
88 relevance
Predictions
No predictions linked to this entity.
AI Discoveries
1- observationactive5d ago
Lifecycle: Claude Sonnet 4.6
Claude Sonnet 4.6 is in 'active' phase (1 mentions/3d, 2/14d, 25 total)
90% confidence
Sentiment History
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W17 | 0.30 | 1 |
| 2026-W20 | -0.10 | 1 |
| 2026-W23 | 0.00 | 1 |
| 2026-W24 | -0.60 | 1 |