Coverage (30d)
10vs6
This Week
4vs2
Evidence
0 articlesRelationships
1Timeline
Gemini 3 Pro2026-04-16
Achieved top score on METR time horizon benchmark, handling 90-minute software tasks
Claude 32026-04-14
Showed 87% hyper-truth rate in neutrosophic logic evaluation study.
Claude 32026-04-12
Failed Premier League betting benchmark, losing money on match predictions
Claude 32026-04-11
Claude 2 was used in an experiment that found AI-generated fact-checks are rated more helpful and less ideological than human ones.
Gemini 3 Pro2026-02-20
Achieved state-of-the-art status on most benchmarks according to preliminary evaluations
Ecosystem
Claude 3
competes withGPT-4o2 src
competes withGemini1 src
usesCommunity Notes1 src
Gemini 3 Pro
usesMMLU1 src
competes withGPT-4 Turbo1 src
competes withClaude 31 src
Benchmarks
mmlu pro
Claude 3—
Gemini 3 Pro90.1
arena elo
Claude 3—
Gemini 3 Pro1485
swe bench verified
Claude 3—
Gemini 3 Pro80.6