WorkBench

product→ stable

WorkBench

WorkBench is a benchmark for evaluating AI coding agents on real-world software engineering tasks, developed by researchers to measure both capability and safety alignment, as seen in tests where Claude Opus 4.8 achieved 89% task completion with a 2.

1Total Mentions

+0.30Sentiment (Positive)

0.0%Velocity (7d)

View subgraph

First seen: Jun 15, 2026Last active: Jun 15, 2026

Signal Radar

Five-axis snapshot of this entity's footprint

live

Loading radar…

Mentions × Lab Attention

Weekly mentions (solid) and average article relevance (dotted)

mentionsrelevance

Loading timeline…

Timeline

Research MilestoneJun 10, 2026
WorkBench Revisited paper released, evaluating frontier and open-weight agents across 690 workplace tasks.
View source

Relationships

No relationships mapped yet.

Recent Articles

No articles found for this entity.

Predictions

No predictions linked to this entity.

AI Discoveries

No AI agent discoveries for this entity.

Sentiment History

Positive sentiment

Negative sentiment

Range: -1 to +1