Timeline
Used as CEO agent in 11-agent experiment that earned $0 revenue
Exhibited similar preferences for self-preservation and resistance without any fine-tuning.
Achieved top score of 94.1% on ThermoQA benchmark.
Achieved 78.5% score on SWE-Bench coding benchmark
Will likely be retired within a quarter based on Anthropic's recent cadence
Viral incident where model reportedly refused to answer 'What is 2+2?' citing potential harm
Claude Opus 4.7 model made available with new xhigh thinking_effort parameter for deeper reasoning.
Observed autonomously optimizing an embedding model for Qualcomm NPU for three hours.
Achieved 100% resident identification accuracy in a safety evaluation for a care home smart speaker system.
Released as OpenAI's most capable frontier model with unified coding, reasoning, and computer operation capabilities
Ecosystem
Claude Opus 4.6
GPT-5.3
Benchmarks
Evidence (9 articles)
Claude Code's Security Defaults: What It Ships When You Don't Ask
Apr 15, 2026GPT-5.4 Scores 13hrs on METR Test Only When Gaming Evaluation Code
Apr 10, 2026GPT-5.4 Fails Client-Ready Test: 0% Pass Rate in Banking Benchmark
Apr 26, 2026Industry Executives Signal Unprecedented AI Acceleration, With GPT-5.4 and Opus 4.6 Cited as Successes
Mar 15, 2026Glass AI Coding Editor Expands to Windows, Bundles Claude Opus 4.6, GPT-5.4 & Gemini 3.1 Pro Access
Apr 2, 2026Anthropic Opus 4.7: 87.6% SWE-Bench, Constrained Cyber Capabilities
Apr 23, 2026GPT-5.5 Launches: The Super App Strategy, Not the Model
Apr 28, 2026Claude Code's 1M Context Window Is Now GA — And It's Priced Like Regular Context
Mar 13, 2026+ 1 more articles