Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A sleek AI interface on a laptop screen displays a Claude Sonnet 5 benchmark score of 1,618, with a cost comparison…
Products & LaunchesBreakthroughScore: 100

Claude Sonnet 5 Beats Opus 4.8 on Knowledge Work at Lower Cost

Anthropic released Claude Sonnet 5, which beats Sonnet 4.6 across all benchmarks and edges past Opus 4.8 on GDPval-AA v2 with a score of 1,618.

·17h ago·3 min read··28 views·AI-Generated·Report error
Share:
Source: the-decoder.comvia the_decoder, engadget, gn_claude_model, nvidia_blog, @rohanpaul_aiWidely Reported
How does Claude Sonnet 5 compare to Opus 4.8 on benchmarks?

Anthropic released Claude Sonnet 5, which scores 1,618 on GDPval-AA v2, edging past Opus 4.8's 1,615. It beats Sonnet 4.6 across all benchmarks and is available now at an introductory price.

TL;DR

Sonnet 5 beats Opus 4.8 on GDPval-AA v2. · SWE-bench Pro score 63.2%, up from 58.1%. · Available now at introductory discount through August.

Anthropic released Claude Sonnet 5, which scores 1,618 on the GDPval-AA v2 benchmark, beating the larger Opus 4.8 at 1,615. The model is available now at an introductory discount through August 2026.

Key facts

  • Sonnet 5 scores 1,618 on GDPval-AA v2, beating Opus 4.8.
  • SWE-bench Pro: 63.2% (Sonnet 5) vs 58.1% (Sonnet 4.6).
  • Terminal-Bench 2.1: 80.4% (Sonnet 5) vs 67.0% (Sonnet 4.6).
  • OSWorld-Verified: 81.2% (Sonnet 5) vs 78.5% (Sonnet 4.6).
  • Available at introductory discount through August 2026.

Anthropic released Claude Sonnet 5, which the company calls its most agentic Sonnet yet According to The Decoder. The model can build plans on its own and use tools like browsers and terminals, closing the gap to the pricier Opus series.

Benchmark gains across the board

Anthropic's published benchmarks show Sonnet 5 beating its predecessor Sonnet 4.6 in every tested category while gaining ground on Opus 4.8 [per the article]. On agentic coding, Sonnet 5 hits 63.2 percent on SWE-bench Pro, up from 58.1 percent for Sonnet 4.6. Opus 4.8 sits at 69.2 percent. On Terminal-Bench 2.1, Sonnet 5 pulls 80.4 percent versus Sonnet 4.6's 67.0 percent. For multidisciplinary reasoning (Humanity's Last Exam), the model reaches 57.4 percent with tools, nearly matching Opus 4.8 at 57.9 percent. On computer use (OSWorld-Verified), Sonnet 5 posts 81.2 percent compared to 78.5 percent for its predecessor.

On the knowledge work benchmark GDPval-AA v2, which tests AI on real-world knowledge tasks, Sonnet 5 actually beats the larger Opus 4.8, scoring 1,618 to Opus's 1,615. Anthropic says feedback from early-access partners told the same story. Sonnet 5 acts far more agentically than previous versions, showing up in things like how it handles search tasks.

Cybersecurity context

This launch comes as the US government blocks two of Anthropic's most capable models, Mythos 5 and Fable 5, over cybersecurity concerns. Anthropic is clearly eager to get ahead of any similar worries. The model wasn't trained on cybersecurity tasks, the company says, and in tests for risky capabilities like writing software exploits, it scores far below both Opus 4.8 and Mythos 5.

Sonnet 5 does score a bit higher than its predecessor on these tasks, though. So Anthropic has switched on cyber safeguards by default. They flag and block risky cyber usage in real time, on par with the protections already in place for Claude Opus 4.7 and 4.8. They're dialed back compared to Fable 5's guardrails, which users complained about almost immediately. Anthropic says it views the overall cybersecurity risk from Sonnet 5 as low.

The model is available now on all Anthropic platforms at an introductory discount, with pricing rising to standard Sonnet rates after August 2026.

What to watch

Watch for enterprise adoption metrics after the introductory discount ends in August 2026. Also monitor whether the US government imposes any restrictions on Sonnet 5 given its improved agentic capabilities, and whether Anthropic releases a new Opus model to maintain distance.

Firefox 147 exploit evaluation. Like its predecessor Sonnet 4.6, Sonnet 5 couldn't develop a fully working exploit but shows a slightly higher partial


Source: the-decoder.com

[Updated 01 Jul via the_decoder]

However, developer Simon Willison noted that Sonnet 5 uses a new tokenizer that produces roughly 30% more tokens for the same English text compared to Sonnet 4.6, effectively raising costs by about 40% [per Simon Willison]. The model also drops support for sampling parameters temperature, top_p, and top_k, and has adaptive thinking enabled by default. It offers a 1 million token context window and 128,000 maximum output tokens.


Sources cited in this article

  1. Simon Willison
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The key structural takeaway here is that Anthropic is compressing performance down the model stack. Sonnet 5 now matches or exceeds Opus 4.8 on real-world knowledge work at a presumably lower inference cost. This mirrors a broader industry trend where mid-tier models catch up to flagship models within 6-9 months, forcing companies to either push further on scaling or differentiate on cost and speed. The cybersecurity context is equally important. The US government blocking Mythos 5 and Fable 5 creates a regulatory overhang that Anthropic must navigate carefully. By explicitly stating Sonnet 5 wasn't trained on cybersecurity tasks and scores low on risky capabilities, Anthropic is signaling to regulators that this model is safe to deploy. The proactive cyber safeguards suggest the company learned from the Fable 5 backlash where users complained about overly restrictive guardrails. Compared to the recent Claude Code updates and the Azure deployment on Nvidia GB300 Blackwell, Sonnet 5 represents a more incremental but strategically important release. It fills the gap between the accessible Sonnet line and the premium Opus line, potentially making agentic capabilities available to a broader set of enterprise customers without the Opus price tag.
This story is part of
Claude Code's Campus Conquest Flips Anthropic's Talent Pipeline, Leaving Google's Academic Edge in Doubt
Viral adoption at MIT and Stanford transforms Claude Code from product into recruiting funnel, threatening Google's long-held research talent dominance
Compare side-by-side
Claude Sonnet 5 vs Claude Opus 4.6
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Products & Launches

View all