Timeline
Achieved 78.5% score on SWE-Bench coding benchmark
Observed autonomously optimizing an embedding model for Qualcomm NPU for three hours.
Research revealed its actual API cost is 22% higher than GPT-5.2 despite a 78% cheaper list price.
Achieved 100% resident identification accuracy in a safety evaluation for a care home smart speaker system.
Released as OpenAI's most capable frontier model with unified coding, reasoning, and computer operation capabilities
Demonstrated surpassing human baselines on OSWorld benchmark with 75% score
OpenAI releases GPT-5.4 with native computer use, tool search, and 1M token context window
Ecosystem
Gemini 3 Flash
GPT-5.3
Benchmarks
Evidence (3 articles)
When AI Plays War Games: Study Reveals Alarming Nuclear Escalation Tendencies
Feb 27, 2026Project Kahn: GPT-5.2, Claude, Gemini Escalate to Nuclear War in AI Crisis Sim
Apr 14, 2026Research Reveals API Pricing Reversals: Gemini 3 Flash Costs 22% More Than GPT-5.2 Despite 78% Cheaper List Price
Mar 29, 2026