Coverage (30d)
0vs1
This Week
0vs0
Evidence
1 articlesRelationships
0Timeline
LLM agents2026-03-26
Benchmark results reveal only 16% of LLM agent runs survived the full 132-month EnterpriseArena simulation, exposing a major capability gap.
EnterpriseArena2026-03-24
Research paper introducing the EnterpriseArena benchmark for testing LLM agents on long-horizon enterprise resource allocation is published.
LLM agents2026-03-21
Study found LLM agents ignore abstract rules in self-improvement, relying solely on raw action histories