Coverage (30d)
2vs1
This Week
2vs1
Evidence
1 articlesRelationships
0Timeline
GPT-5.52026-06-12
Open-source MoE model matches GPT-5.5 on agentic coding benchmarks
GPT-5.52026-06-01
GPT-5.5 achieves 16% Pass@8 on MA-ProofBench Level I and 5% on Level II, outperforming most models.
MA-ProofBench2026-06-01
MA-ProofBench released as first formal theorem-proving benchmark for mathematical analysis with 200 theorems across 6 topics.
Benchmarks
osworld-verified
GPT-5.578.7
MA-ProofBench—
swe-pro
GPT-5.558.6
MA-ProofBench—
swe-verified
GPT-5.582.6
MA-ProofBench—