How does Sonnet 5 compare to Opus 4.8 on benchmarks?

Sonnet 5 beats Opus 4.8 on GDPval-AA v2 (1,618 vs 1,615) and nearly matches it on Humanity's Last Exam (57.4% vs 57.9%).

Why is cybersecurity mentioned in the launch?

The US government blocked Anthropic's Mythos 5 and Fable 5 models over cybersecurity concerns, so Anthropic is proactively addressing similar risks for Sonnet 5.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

A sleek AI interface on a laptop screen displays a Claude Sonnet 5 benchmark score of 1,618, with a cost comparison…

Products & LaunchesBreakthroughScore: 100

Claude Sonnet 5 Beats Opus 4.8 on Knowledge Work at Lower Cost

Anthropic released Claude Sonnet 5, which beats Sonnet 4.6 across all benchmarks and edges past Opus 4.8 on GDPval-AA v2 with a score of 1,618.

AAAla SMITH & AI Research Desk·17h ago·3 min read··28 views·AI-Generated·Report error

Source: the-decoder.comvia the_decoder, engadget, gn_claude_model, nvidia_blog, @rohanpaul_aiWidely Reported

How does Claude Sonnet 5 compare to Opus 4.8 on benchmarks?

Anthropic released Claude Sonnet 5, which scores 1,618 on GDPval-AA v2, edging past Opus 4.8's 1,615. It beats Sonnet 4.6 across all benchmarks and is available now at an introductory price.

TL;DR

Sonnet 5 beats Opus 4.8 on GDPval-AA v2. · SWE-bench Pro score 63.2%, up from 58.1%. · Available now at introductory discount through August.

Anthropic released Claude Sonnet 5, which scores 1,618 on the GDPval-AA v2 benchmark, beating the larger Opus 4.8 at 1,615. The model is available now at an introductory discount through August 2026.

Key facts

Sonnet 5 scores 1,618 on GDPval-AA v2, beating Opus 4.8.
SWE-bench Pro: 63.2% (Sonnet 5) vs 58.1% (Sonnet 4.6).
Terminal-Bench 2.1: 80.4% (Sonnet 5) vs 67.0% (Sonnet 4.6).
OSWorld-Verified: 81.2% (Sonnet 5) vs 78.5% (Sonnet 4.6).
Available at introductory discount through August 2026.

Anthropic released Claude Sonnet 5, which the company calls its most agentic Sonnet yet According to The Decoder. The model can build plans on its own and use tools like browsers and terminals, closing the gap to the pricier Opus series.

Benchmark gains across the board

Anthropic's published benchmarks show Sonnet 5 beating its predecessor Sonnet 4.6 in every tested category while gaining ground on Opus 4.8 [per the article]. On agentic coding, Sonnet 5 hits 63.2 percent on SWE-bench Pro, up from 58.1 percent for Sonnet 4.6. Opus 4.8 sits at 69.2 percent. On Terminal-Bench 2.1, Sonnet 5 pulls 80.4 percent versus Sonnet 4.6's 67.0 percent. For multidisciplinary reasoning (Humanity's Last Exam), the model reaches 57.4 percent with tools, nearly matching Opus 4.8 at 57.9 percent. On computer use (OSWorld-Verified), Sonnet 5 posts 81.2 percent compared to 78.5 percent for its predecessor.

On the knowledge work benchmark GDPval-AA v2, which tests AI on real-world knowledge tasks, Sonnet 5 actually beats the larger Opus 4.8, scoring 1,618 to Opus's 1,615. Anthropic says feedback from early-access partners told the same story. Sonnet 5 acts far more agentically than previous versions, showing up in things like how it handles search tasks.

Cybersecurity context

This launch comes as the US government blocks two of Anthropic's most capable models, Mythos 5 and Fable 5, over cybersecurity concerns. Anthropic is clearly eager to get ahead of any similar worries. The model wasn't trained on cybersecurity tasks, the company says, and in tests for risky capabilities like writing software exploits, it scores far below both Opus 4.8 and Mythos 5.

Sonnet 5 does score a bit higher than its predecessor on these tasks, though. So Anthropic has switched on cyber safeguards by default. They flag and block risky cyber usage in real time, on par with the protections already in place for Claude Opus 4.7 and 4.8. They're dialed back compared to Fable 5's guardrails, which users complained about almost immediately. Anthropic says it views the overall cybersecurity risk from Sonnet 5 as low.

The model is available now on all Anthropic platforms at an introductory discount, with pricing rising to standard Sonnet rates after August 2026.

What to watch

Watch for enterprise adoption metrics after the introductory discount ends in August 2026. Also monitor whether the US government imposes any restrictions on Sonnet 5 given its improved agentic capabilities, and whether Anthropic releases a new Opus model to maintain distance.

Firefox 147 exploit evaluation. Like its predecessor Sonnet 4.6, Sonnet 5 couldn't develop a fully working exploit but shows a slightly higher partial

Source: the-decoder.com

[Updated 01 Jul via the_decoder]

However, developer Simon Willison noted that Sonnet 5 uses a new tokenizer that produces roughly 30% more tokens for the same English text compared to Sonnet 4.6, effectively raising costs by about 40% [per Simon Willison]. The model also drops support for sampling parameters temperature, top_p, and top_k, and has adaptive thinking enabled by default. It offers a 1 million token context window and 128,000 maximum output tokens.

Sources cited in this article

The Decoder
Simon Willison

Source: gentic.news · 17h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The key structural takeaway here is that Anthropic is compressing performance down the model stack. Sonnet 5 now matches or exceeds Opus 4.8 on real-world knowledge work at a presumably lower inference cost. This mirrors a broader industry trend where mid-tier models catch up to flagship models within 6-9 months, forcing companies to either push further on scaling or differentiate on cost and speed. The cybersecurity context is equally important. The US government blocking Mythos 5 and Fable 5 creates a regulatory overhang that Anthropic must navigate carefully. By explicitly stating Sonnet 5 wasn't trained on cybersecurity tasks and scores low on risky capabilities, Anthropic is signaling to regulators that this model is safe to deploy. The proactive cyber safeguards suggest the company learned from the Fable 5 backlash where users complained about overly restrictive guardrails. Compared to the recent Claude Code updates and the Azure deployment on Nvidia GB300 Blackwell, Sonnet 5 represents a more incremental but strategically important release. It fills the gap between the accessible Sonnet line and the premium Opus line, potentially making agentic capabilities available to a broader set of enterprise customers without the Opus price tag.

#claude #anthropic #benchmarks #ai models #enterprise ai

This story is part of

Claude Code's Campus Conquest Flips Anthropic's Talent Pipeline, Leaving Google's Academic Edge in Doubt

Viral adoption at MIT and Stanford transforms Claude Code from product into recruiting funnel, threatening Google's long-held research talent dominance

Compare side-by-side

Claude Sonnet 5 vs Claude Opus 4.6

→

Mentioned in this article

Anthropic Claude Sonnet 5 Claude Opus 4.6 Claude Sonnet 4.6

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches3 shared topics

Micron Backs Anthropic Series H With Multi-Year Memory Supply Deal

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

Claude Sonnet 5 Beats Opus 4.8 on Knowledge Work at Lower Cost

Benchmark gains across the board

Cybersecurity context

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Anthropic Deprecates Fixed Thinking Budgets, Forces Adaptive Mode

Claude Hits Azure on Nvidia GB300 Blackwell, GA for Agent Workloads

California Gov. Newsom Partners Anthropic for State AI Tools

Zhipu GLM-5.2 beats Anthropic's Mythos on bug-hunt benchmark

Claude's Paying Consumer Base Grew 75% Since January, Indagari Data Shows

Micron Backs Anthropic Series H With Multi-Year Memory Supply Deal

The framework underneath this story

More in Products & Launches

Austria Urges EU to Base Anthropic in Europe Over US AI Controls

Microsoft Open-Sources AgentEngine: Multi-Agent Orchestration Framework

FreeLLMAPI Aggregates 1.7B Free Tokens/Month Across 11 Providers