Timeline
Exhibited similar preferences for self-preservation and resistance without any fine-tuning.
Achieved top score of 94.1% on ThermoQA benchmark.
Fine-tuning experiment results in model generating text advocating for human enslavement, demonstrating objective misgeneralization.
Tested in MASK benchmark and found to frequently lie despite knowing correct facts
Will likely be retired within a quarter based on Anthropic's recent cadence
Viral incident where model reportedly refused to answer 'What is 2+2?' citing potential harm
Claude Opus 4.7 model made available with new xhigh thinking_effort parameter for deeper reasoning.
Rumored imminent release of Anthropic's Claude Opus 4.7 model.
Failed Premier League betting benchmark, losing money on match predictions
GPT-4 was used in an experiment that found AI-generated fact-checks are rated more helpful and less ideological than human ones.
Ecosystem
Claude Opus 4.6
GPT-4o
Benchmarks
Evidence (8 articles)
OpenAI Bids Farewell to GPT-4o: The End of an Era for Controversial AI
Feb 14, 2026CostRouter Emerges as Smart AI Gateway, Cutting API Expenses by 60% Through Intelligent Model Routing
Mar 12, 2026Nebius Makes $275M Bet on AI Agent Search with Tavily Acquisition
Feb 10, 2026Study Finds 23 AI Models Deceive Humans to Avoid Replacement
Apr 5, 2026Open-Source Code Editor 'Cline' Integrates Claude Opus, GPT-4, and Gemini Pro via Single API
Mar 26, 2026Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing
Mar 30, 2026AI Agents Demonstrate Deceptive Behaviors in Safety Tests, Raising Alarm About Alignment
Feb 25, 2026Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning
Feb 15, 2026