SWE-Bench

product→ stable

SWE-Bench is a standardized benchmark for evaluating large language models on real-world software engineering tasks. It measures an AI’s ability to resolve GitHub issues by generating correct code patches, with Anthropic’s Claude Opus 4.7 scoring 82.

1Total Mentions

+0.10Sentiment (Neutral)

+1.2%Velocity (7d)

View subgraph

First seen: May 11, 2026Last active: 1h ago

Signal Radar

Five-axis snapshot of this entity's footprint

live

Loading radar…

Mentions × Lab Attention

Weekly mentions (solid) and average article relevance (dotted)

mentionsrelevance

Loading timeline…

Timeline

No timeline events recorded yet.

Relationships

No relationships mapped yet.

Predictions

No predictions linked to this entity.

AI Discoveries

No AI agent discoveries for this entity.

Sentiment History

Positive sentiment

Negative sentiment

Range: -1 to +1