AI Now Surpasses Human Experts in Technical Domains, Study Finds

New research mapping AI capabilities to human expertise reveals frontier models have already surpassed domain experts in technical and scientific benchmarks. The study forecasts AI will reach top-performer human levels by late 2027.

AAAla AYADI & AI Research Desk·Mar 9, 2026·4 min read··135 views·AI-Generated·Report error

Source: lesswrong.comvia lesswrongSingle Source

AI Has Surpassed Human Experts in Technical Skills, New Analysis Reveals

A groundbreaking analysis from the General-Purpose AI Policy Lab has mapped artificial intelligence capabilities directly to human expertise levels, revealing that current frontier AI models have already surpassed domain experts in technical and scientific benchmark tasks. The research, which extends the "Rosetta Stone for AI Benchmarks" framework developed by Epoch AI and Google DeepMind researchers, provides the first concrete human-anchored scale for interpreting AI capability scores.

The Rosetta Stone Framework: From Abstract Scores to Human Benchmarks

The original "Rosetta Stone" paper, published on arXiv, created a unified difficulty scale that allows different AI benchmarks and models to be compared directly. This framework underpins the Epoch Capability Index (ECI), which has become a valuable tool for tracking AI progress. However, as the researchers note, the resulting capability scores remained abstract—what does a score of 2.54 actually mean in practical terms?

To address this interpretability gap, the research team integrated human performance baselines directly into the Rosetta framework. They collected data on human performance across multiple expertise levels, ranging from crowd workers to PhD-level domain experts and top performers in their fields. This integration transforms abstract capability scores into meaningful comparisons with human abilities.

Methodology: Navigating Benchmark Biases

One significant challenge the researchers faced was that some benchmarks have been specifically designed to be "AI-hard"—tasks that are easy for humans but difficult for AI systems. This design choice contradicts the Rosetta framework's assumption of a single axis of capability and difficulty. To account for this potential bias, the team performed their analysis both with and without these specially designed benchmarks, ensuring their conclusions weren't skewed by benchmark design choices.

Figure 3 — Projection of frontier AI capabilities toward human performance levels (technical and scientific competencies). The pink band represents th

The analysis focused primarily on technical and scientific benchmark skills, where human performance data was most reliable and comparable. This restriction is important—the findings don't necessarily apply to creative, social, or physical tasks where AI capabilities differ significantly.

Key Findings: AI's Rapid Ascent Through Human Expertise Levels

The results reveal a startling pace of AI advancement relative to human expertise:

Figure 2 — Evolution of AI capabilities and benchmark difficulties compared to human levels (technical and scientific competencies). Horizontal dashed

Average Human Level: Frontier AI models crossed this threshold in late 2022
Skilled Generalist Level: AI surpassed this level in early 2024
Domain Expert Level: Current frontier models have now exceeded this level (as of 2025)

Perhaps most strikingly, the research forecasts that AI will reach Top-Performer human levels by October 2027, with a 95% confidence interval ranging from May 2027 to March 2028. This projection suggests we're just a few years away from AI systems that can outperform the best humans in technical benchmark tasks.

Limitations and Caveats

The researchers emphasize several important limitations to their analysis. First, benchmarks remain imperfect proxies for real-world capabilities—excelling at benchmark tasks doesn't necessarily translate to practical expertise in complex, real-world scenarios. Second, human performance data is inconsistently collected and sparse for many domains, making precise comparisons challenging.

Figure 1 from A Rosetta Stone for AI Benchmarks. Estimated model capabilities and benchmark difficulties over time. 0 corresponds to the difficulty of

Additionally, the forecasted timeline should be "interpreted with a grain of salt," as the researchers note. AI progress isn't guaranteed to follow historical trends, and unexpected bottlenecks or breakthroughs could accelerate or delay these projections.

Implications for AI Policy and Development

This research has significant implications for AI policy and safety. By providing concrete human reference points for AI capabilities, policymakers and researchers can better assess when AI systems might reach concerning capability thresholds. The finding that AI has already surpassed domain experts in technical benchmarks suggests we may need to rethink traditional assumptions about human expertise maintaining an advantage in specialized domains.

The integration of human baselines into capability assessment frameworks represents an important step toward more interpretable AI evaluation. As AI systems become more capable, understanding how they compare to human abilities becomes increasingly crucial for responsible development and deployment.

Source: General-Purpose AI Policy Lab research blog, extending the "Rosetta Stone for AI Benchmarks" framework developed by Epoch AI and Google DeepMind researchers.

Source: gentic.news · Mar 9, 2026 · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This research represents a significant methodological advancement in AI capability assessment. By anchoring abstract capability scores to concrete human performance levels, it addresses a critical interpretability problem that has plagued AI benchmarking. The finding that frontier models have already surpassed domain experts in technical benchmarks is particularly noteworthy—it suggests that specialized human expertise may no longer serve as a meaningful upper bound for AI capabilities in many domains. The forecast that AI will reach top-performer human levels by late 2027 has important implications for AI safety and policy timelines. If accurate, this suggests we have approximately two years before AI systems could outperform the best humans in benchmarked technical tasks. However, the researchers appropriately caution against overinterpreting these projections, given the limitations of benchmarks as proxies for real-world capability and the sparse human performance data available. This work also highlights the need for more comprehensive human performance data collection across diverse domains. As AI systems approach and surpass human capabilities in more areas, having robust human baselines will become increasingly important for meaningful evaluation and comparison. The methodological approach demonstrated here could serve as a template for future work integrating human and AI performance assessment.

#policy analysis #capability assessment #benchmarking #technical skills #ai research

Compare side-by-side

General-Purpose AI Policy Lab vs Epoch AI

→

Mentioned in this article

Rosetta Stone for AI Benchmarks General-Purpose AI Policy Lab Google Epoch AI Epoch Capability Index

Enjoyed this article?