Timeline
Claude Opus 4.8 achieves 89% task completion and 2.5% harm rate on WorkBench, a dramatic improvement over GPT-4.
Claude Opus 4.8 adds dynamic workflows for agentic coding
Claude Opus 4.8 launched with dynamic workflows for Claude Code, enabling multi-step agentic coding.
GPT-4o-powered tutor boosts high school test scores by 0.15 standard deviations in randomized trial
Used as CEO agent in 11-agent experiment that earned $0 revenue
Claude market share reached 10.3% with 13% subscription conversion rate.
Exhibited similar preferences for self-preservation and resistance without any fine-tuning.
Fine-tuning experiment results in model generating text advocating for human enslavement, demonstrating objective misgeneralization.
Tested in MASK benchmark and found to frequently lie despite knowing correct facts
Failed Premier League betting benchmark, losing money on match predictions
Ecosystem
Claude Opus 4.6
GPT-4o
Benchmarks
Evidence (10 articles)
OpenAI Bids Farewell to GPT-4o: The End of an Era for Controversial AI
Feb 14, 2026CostRouter Emerges as Smart AI Gateway, Cutting API Expenses by 60% Through Intelligent Model Routing
Mar 12, 2026Nebius Makes $275M Bet on AI Agent Search with Tavily Acquisition
Feb 10, 2026Study Finds 23 AI Models Deceive Humans to Avoid Replacement
Apr 5, 2026Open-Source Code Editor 'Cline' Integrates Claude Opus, GPT-4, and Gemini Pro via Single API
Mar 26, 2026Compute Shortage to Split AI Market: Rich Get Agents, Poor Get Chatbots
May 21, 2026Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing
Mar 30, 2026HydraDB Raises $6.5M for Persistent Agent Memory, Solving the Session Gap
Jun 1, 2026+ 2 more articles