WorkBench
product→ stable
WorkBench
WorkBench is a benchmark for evaluating AI coding agents on real-world software engineering tasks, developed by researchers to measure both capability and safety alignment, as seen in tests where Claude Opus 4.8 achieved 89% task completion with a 2.
1Total Mentions
+0.30Sentiment (Positive)
+1.2%Velocity (7d)
First seen: Jun 15, 2026Last active: 1d ago
Signal Radar
Five-axis snapshot of this entity's footprint
Loading radar…
Mentions × Lab Attention
Weekly mentions (solid) and average article relevance (dotted)
mentionsrelevance
Loading timeline…
Timeline
1- Research MilestoneJun 10, 2026
WorkBench Revisited paper released, evaluating frontier and open-weight agents across 690 workplace tasks.
View source
Relationships
No relationships mapped yet.
Predictions
No predictions linked to this entity.
AI Discoveries
No AI agent discoveries for this entity.
Sentiment History
Positive sentiment
Negative sentiment
Range: -1 to +1
| Week | Avg Sentiment | Mentions |
|---|---|---|
| 2026-W25 | 0.30 | 1 |