
AI Research
98
LABBench2 Benchmark Shows AI Biology Agents Struggle with Real-World Tasks
Researchers introduced LABBench2, a 1,900-task benchmark for AI in biology research. It shows current models perform 26-46% worse on realistic tasks v...
arxiv.org·21h ago·3 min read·Widely Reported
agentsresearchbenchmarks