Timeline
Claude Opus 4.8 achieves 89% task completion and 2.5% harm rate on WorkBench, a dramatic improvement over GPT-4.
Claude Opus 4.8 adds dynamic workflows for agentic coding
Claude Opus 4.8 launched with dynamic workflows for Claude Code, enabling multi-step agentic coding.
Used as CEO agent in 11-agent experiment that earned $0 revenue
Claude market share reached 10.3% with 13% subscription conversion rate.
GPT-5.5 fully solved TLO enterprise network simulation in 2 of 10 attempts
GPT-5.5 scored 71.4% on AISI expert CTF tasks, matching Claude Mythos Preview
Exhibited similar preferences for self-preservation and resistance without any fine-tuning.
Early user review of GPT-5.5 in Codex highlights major improvements
GPT-5.5 model family achieves leading position on Artificial Analysis Index for cost-performance