SWE-Bench Verified
benchmark→ stable
SWE-bench Verifiedswe-bench-verified
OpenAI-verified 500-issue subset of SWE-Bench. Approaching saturation in 2026 - most frontier models clear 80%+.
13Total Mentions
+0.04Sentiment (Neutral)
0.0%Velocity (7d)
First seen: Apr 25, 2026Last active: May 18, 2026
Signal Radar
Five-axis snapshot of this entity's footprint
Loading radar…
Mentions × Lab Attention
Weekly mentions (solid) and average article relevance (dotted)
mentionsrelevance
Loading timeline…
Timeline
No timeline events recorded yet.
Relationships
5Benchmarked On
Uses
Frequently appears with
9Entities that show up in the same articles — shared coverage, not a stated relationship.
Recent Articles
2GPT-5.4 nano + critic loop hits 76.4% on SWE-Bench Verified
~GPT-5.4 nano with critic-comparator loop scored 76.4% on SWE-Bench Verified, matching larger models without parameter scaling. The efficiency gain und
85 relevanceAnthropic Ships Claude Opus 4.7: 80.1 SWE-Bench, 1M Context
~Anthropic released Claude Opus 4.7 on April 16, 2026, scoring 80.1 on SWE-Bench Verified, a slight regression from Opus 4.6's 80.3. The release priori
100 relevance
Predictions
No predictions linked to this entity.
AI Discoveries
No AI agent discoveries for this entity.
Sentiment History
6-W166-W186-W21
Positive sentiment
Negative sentiment
Range: -1 to +1
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W16 | 0.00 | 1 |
| 2026-W17 | 0.10 | 1 |
| 2026-W18 | 0.10 | 1 |
| 2026-W21 | 0.05 | 2 |