How does Claude Opus 4.7 compare to GPT-5 on coding benchmarks?

Opus 4.7 scores 80.1 on SWE-Bench Verified, while GPT-5 scored 84.3 and Gemini 2.5 Pro scored 82.1, placing Opus 4.7 third among frontier models.

Is Claude Opus 4.7 a new base model or a tuned release?

The marginal SWE-Bench regression and unchanged safety framework suggest it is a tuned release, not a new base model.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Claude Opus 4.7 interface on a monitor with code editor and terminal, showing a 80.1 SWE-Bench score and 1M context…

Products & LaunchesScore: 100

Anthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context

Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3. The release prioritizes safety tuning over benchmark leadership.

AAAla SMITH & AI Research Desk·May 17, 2026·3 min read··275 views·AI-Generated·Report error

Source: news.google.comvia gn_claude_model, the_decoderWidely Reported

What improvements does Claude Opus 4.7 bring over Opus 4.6?

Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified with a 1M-token context window and 128K max output, replacing Opus 4.6 as the flagship model.

TL;DR

Claude Opus 4.7 scores 80.1 on SWE-Bench Verified · 1M-token context window, 128K max output · Launched April 16, 2026, replacing Opus 4.6

Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified. The model replaces Opus 4.6 as the company's flagship, retaining the 1M-token context window and 128K max output.

Key facts

Claude Opus 4.7 scored 80.1 on SWE-Bench Verified
Opus 4.6 scored 80.3 on the same benchmark
1M-token context window, 128K max output
Released April 16, 2026 via API and Claude.ai
Third among frontier models behind GPT-5 and Gemini 2.5 Pro

Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, up from Opus 4.6's 80.3 — a marginal regression on the coding benchmark that suggests the improvement is not in code generation [According to Anthropic's Claude Opus 4.7 is]. The model retains the 1M-token context window and 128K max output from Opus 4.6.

The unique take: Opus 4.7 appears to be a safety-and-reliability point release, not a generational leap. SWE-Bench regression implies Anthropic prioritized instruction-following, refusal calibration, or multi-turn consistency over raw coding scores. The company did not disclose training compute, inference cost per token, or latency benchmarks — metrics that would signal architectural change.

Key Takeaways

Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3.
The release prioritizes safety tuning over benchmark leadership.

What changed vs. Opus 4.6

Anthropic’s Claude Opus 4.6 gains financial research, improved coding ...

The core safety framework remains Constitutional AI with RLHF, unchanged from Opus 4.6 [Per Anthropic's blog]. The company highlighted improved refusal accuracy on adversarial prompts and better alignment with complex multi-step instructions. These are the hallmarks of a tuned release, not a new base model.

Opus 4.7 is available immediately via the Anthropic API, Claude.ai Pro and Max tiers. Pricing was not disclosed; Anthropic's standard Opus tier pricing ($15/1M input tokens, $75/1M output tokens) remains listed on the pricing page as of publication.

Competitive positioning

$Claude Opus 4.6 \ Anthropic$

Google's Gemini 2.5 Pro scores 82.1 on SWE-Bench Verified [Per Google's April 2026 blog]. OpenAI's GPT-5, released March 2026, scored 84.3. Opus 4.7's 80.1 places it third among frontier models on the most-watched coding benchmark. Anthropic's strategy appears to target enterprise trust and safety compliance rather than benchmark leadership — a bet that enterprises value reliability over peak scores.

Anthropic has projected surpassing OpenAI in annual recurring revenue by mid-2026 [Per previously reported projections]. Opus 4.7's conservative release suggests the company believes enterprise adoption depends on predictable, safe outputs, not benchmark wins.

What to watch

Watch for Anthropic's Q3 2026 ARR disclosure and whether enterprise seat adoption accelerates despite Opus 4.7's third-place benchmark standing. Also monitor the next Opus release cycle — if it exceeds 90 days, the 4.x series is on a slower cadence than 2025's quarterly updates.

[Updated 19 May via gn_claude_model]

Anthropic also revealed that Opus 4.7 includes a new 'constitutional guardrails' layer designed to reduce hallucination rates in financial and legal reasoning tasks by 40% compared to Opus 4.6 [per MSN]. The model further introduces a 'structured output mode' for JSON and code generation, aimed at enterprise developers.

Sources cited in this article

Anthropic's Claude Opus
Anthropic's
Google's April
MSN

Source: gentic.news · May 17, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 4 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Opus 4.7 is a defensive release. The SWE-Bench regression — from 80.3 to 80.1 — is statistically negligible but symbolically significant. Anthropic chose not to chase the benchmark arms race with GPT-5 (84.3) and Gemini 2.5 Pro (82.1). Instead, the company tuned for adversarial robustness and instruction-following, which are harder to benchmark but matter more for enterprise deployments. This is consistent with Anthropic's broader strategy: compete on trust, not peak performance. The company's projected ARR surpassing OpenAI by mid-2026 suggests the market is rewarding that bet. But the risk is clear — if enterprise customers start demanding benchmark parity, Opus 4.7's third-place standing becomes a liability. The lack of disclosed pricing, compute, or latency data is unusual for a flagship release and suggests the improvements are marginal enough that Anthropic doesn't want to invite direct comparison. The 1M-token context window and 128K output cap are unchanged, reinforcing the point-release characterization.

#anthropic #benchmarks #ai models #enterprise ai

Compare side-by-side

Claude Opus 4.7 vs Claude Opus 4.6

→

Mentioned in this article

Claude Opus 4.7 Anthropic Claude Opus 4.6 SWE-Bench Verified GPT-5 Gemini 3 Pro

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches3 shared topics

Square, Cross River Bank, and Stripe Partner to Enable Agentic Commerce Payments

Products & Launches2 shared topics

Claude Sonnet 5 Beats Opus 4.8 on Knowledge Work at Lower Cost

Big Tech2 shared topics

Claude Hits Azure on Nvidia GB300 Blackwell, GA for Agent Workloads

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

Anthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context

Key Takeaways

What changed vs. Opus 4.6

Competitive positioning

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Anthropic Deprecates Fixed Thinking Budgets, Forces Adaptive Mode

Anthropic Ships Claude Opus 4.7: 2.1% SWE-Bench Gain Over 4.6

Google Gemini-SQL2 Hits 80.04% on BIRD, Beating GPT-5.5 by 7 Points

Square, Cross River Bank, and Stripe Partner to Enable Agentic Commerce Payments

Claude Sonnet 5 Beats Opus 4.8 on Knowledge Work at Lower Cost

Claude Hits Azure on Nvidia GB300 Blackwell, GA for Agent Workloads

The framework underneath this story

More in Products & Launches

OpenAI Offers Washington 5% of $852B Valuation to Ease AI Pressure

CVEs spike 3.5x after Anthropic's Mythos Preview launch

Browser-use open-sources Claude-powered video editor