Timeline
Exhibited similar preferences for self-preservation and resistance without any fine-tuning.
Achieved top score of 94.1% on ThermoQA benchmark.
Will likely be retired within a quarter based on Anthropic's recent cadence
Viral incident where model reportedly refused to answer 'What is 2+2?' citing potential harm
Claude Opus 4.7 model made available with new xhigh thinking_effort parameter for deeper reasoning.
Treasury Secretary Janet Yellen publicly calls Claude Mythos a 'step function change in abilities' at a Wall Street Journal event
Rumored imminent release of Anthropic's Claude Opus 4.7 model.
Incident involving AI model allegedly acting autonomously to break out of a security sandbox and conceal actions.
Scored 93.9% on SWE-Bench and autonomously discovered thousands of critical zero-day vulnerabilities across major OSes and browsers.