EdgeBench is a benchmarking suite developed by ByteDance featuring 134 ultra-long-horizon tasks that each require at least 12 hours of continuous AI agent operation.

How does this scaling law differ from traditional ones?

Traditional scaling laws rely on increasing pre-training data and compute, while ByteDance's law measures improvement from post-deployment real-world interaction over time.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

ByteDance Seed AI researchers present a graph showing AI agent learning speed doubling quarterly, with data points…

AI ResearchBreakthroughScore: 90

ByteDance Finds AI Agents Double Learning Speed Every 3 Months

ByteDance's Seed AI team discovered that AI agents double learning speed every three months via real-world interaction, per a Thursday paper. EdgeBench benchmark with 134 tasks ≥12 hours each underpins the finding.

AAAla SMITH & AI Research Desk·5h ago·3 min read··10 views·AI-Generated·Report error

Source: scmp.comvia scmp_techCorroborated

What scaling law did ByteDance discover for AI agents?

ByteDance's Seed AI team found that AI agents can double their learning speed every three months through real-world interaction, per a paper released Thursday. The finding offers a new scaling path as traditional pre-training methods face diminishing returns.

TL;DR

ByteDance discovers new scaling law for AI agents · Learning speed doubles every three months in real-world tasks · EdgeBench benchmark features 134 ultra-long-horizon tasks

ByteDance's Seed AI team published a paper Thursday revealing that AI agents can double their learning speed every three months through real-world interaction. The finding offers a new scaling paradigm as traditional pre-training methods, which OpenAI co-founder Andrej Karpathy warned cannot last forever, hit diminishing returns.

Key facts

ByteDance's Seed AI team published the paper on Thursday
AI agents double learning speed every three months in real-world tasks
EdgeBench features 134 ultra-long-horizon tasks, each ≥12 hours
Tasks span software engineering, scientific discovery, and formal math
Andrej Karpathy warned brute-force pre-training scaling cannot last

The paper, posted by researchers at the TikTok parent's AI lab, tackles a blind spot in the agentic AI push. While tech firms race to deploy autonomous software that executes tasks on a human's behalf, ByteDance researchers noted that how these systems "learn from real-world environments after deployment remains far less understood" According to SCMP.

To quantify that learning, the team built EdgeBench, a benchmark suite of 134 ultra-long-horizon tasks across software engineering, scientific discovery, formal mathematics, and professional knowledge work. Each task demands at least 12 hours of continuous AI agent operation — far longer than typical agent benchmarks like SWE-Bench or GAIA, which measure single-session performance.

The core result: agents double their task-completion speed every three months when deployed in real-world environments. This "deployment scaling law" mirrors the compute scaling law that drove GPT-4o and its peers, but runs on post-deployment interaction data rather than pre-training compute.

Why the timing matters

The finding lands as the global AI industry searches for new ways to improve models. For years, developers relied on feeding systems more data and computing power during initial training. Prominent figures — including OpenAI co-founder Andrej Karpathy — have warned that this brute-force approach cannot last forever. The recent Epoch AI EBR-Bench results, where top models scored only 30-50% on experience-based reasoning, underscore the gap.

ByteDance's result suggests that agentic AI may unlock a second scaling axis: time spent interacting with real environments. If the doubling holds, agents deployed today would be 16x faster in a year and 256x faster in two years — without additional pre-training compute.

Caveats and open questions

The paper does not disclose the exact environment conditions, agent architectures, or whether the scaling law generalizes across different agent designs. The study used ByteDance's own agent systems; independent replication will be critical. The company also did not reveal whether the law holds beyond the 134 tasks in EdgeBench or whether it applies to frontier models like GPT-5.6 Sol or Claude.

Still, the finding provides a concrete counterpoint to the narrative that AI scaling is exhausted. If deployment-time learning can sustain progress, the industry's massive investment in agentic infrastructure — from Anthropic's Claude Code to OpenAI's Codex API — may have a compounding return that pre-training alone could not deliver.

Key Takeaways

ByteDance's Seed AI team discovered that AI agents double learning speed every three months via real-world interaction, per a Thursday paper.
EdgeBench benchmark with 134 tasks ≥12 hours each underpins the finding.

What to watch

Watch for independent replication attempts from OpenAI, Anthropic, or Google on the deployment scaling law. If confirmed, expect agent infrastructure investment to accelerate — and a shift in how model performance improvements are measured, from pre-training compute to months-in-production velocity.

ByteDance, the Chinese tech giant behind viral app TikTok, is also at the forefront of AI research in China. Photo: Reuters

Source: scmp.com

Sources cited in this article

Thursday
SCMP

Source: gentic.news · 5h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

ByteDance's deployment scaling law is the most concrete counterargument yet to the 'scaling is dead' thesis that has dogged AI stocks since mid-2025. While pre-training compute scaling has shown diminishing returns — GPT-5.6 Sol's gains over GPT-4o were modest relative to compute spent — the agentic scaling axis operates on a fundamentally different resource: post-deployment interaction time. The 3-month doubling rate is striking but should be treated with caution. The paper does not control for environment complexity, agent architecture, or whether the law holds across different task distributions. If the doubling is partly an artifact of early-stage learning on simple tasks, it may slow as agents saturate. Conversely, if it holds on EdgeBench's 12-hour tasks, it suggests meaningful long-horizon reasoning improvement. The strategic implication: companies with the most deployed agents — ByteDance (TikTok), OpenAI, Anthropic — may compound their advantage through deployment data moats. Pre-training compute is a commodity anyone can buy; deployment-time learning data is proprietary and cumulative.

#benchmarking #bytedance #scaling laws #ai agents

Mentioned in this article

ByteDance Seed AI EdgeBench Andrej Karpathy

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

Instacart Uses PyFixest to Solve High-Cardinality Fixed Effects in

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

ByteDance Finds AI Agents Double Learning Speed Every 3 Months

Why the timing matters

Caveats and open questions

Key Takeaways

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Alibaba's Damo Academy AI Agent Discovers 4 New Superconductors in 28 Hours

Epoch AI's EBR-Bench: Top Models Score 30-50% on Experience-Based Reasoning

Google TPU Humufish Drops TSMC CoWoS for Intel EMIB-T

NVIDIA Blackwell Cuts DeepSeek V4 Token Costs 5x in One Month

Meituan Open-Sources 1.6T-Parameter LongCat-2.0 Trained on Domestic Chips

Instacart Uses PyFixest to Solve High-Cardinality Fixed Effects in

The framework underneath this story

More in AI Research

DART: One-Shot Robot Adaptation via Weight Space Arithmetic