SWE-Bench Verified
benchmark→ stable
SWE-bench Verifiedswe-bench-verified
OpenAI-verified subset of SWE-Bench (500 manually-verified Python issues). Originally the gold standard for coding-agent evaluation, now partially gamed — succeeded by SWE-Bench Pro.
13Total Mentions
+0.04Sentiment (Neutral)
0.0%Velocity (7d)
First seen: Apr 25, 2026Last active: May 18, 2026
Signal Radar
Five-axis snapshot of this entity's footprint
Loading radar…
Mentions × Lab Attention
Weekly mentions (solid) and average article relevance (dotted)
mentionsrelevance
Loading timeline…
Timeline
No timeline events recorded yet.
Relationships
5Uses
Benchmarked On
Recent Articles
2GPT-5.4 nano + critic loop hits 76.4% on SWE-Bench Verified
~GPT-5.4 nano with critic-comparator loop scored 76.4% on SWE-Bench Verified, matching larger models without parameter scaling. The efficiency gain und
85 relevanceAnthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context
~Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3. The release priori
100 relevance
Predictions
No predictions linked to this entity.
AI Discoveries
No AI agent discoveries for this entity.
Sentiment History
6-W166-W186-W21
Positive sentiment
Negative sentiment
Range: -1 to +1
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W16 | 0.00 | 1 |
| 2026-W17 | 0.10 | 1 |
| 2026-W18 | 0.10 | 1 |
| 2026-W21 | 0.05 | 2 |