Ethan Mollick, a Wharton professor who tracks AI productivity research, stated on X that no rigorous productivity study exists for autonomous coding tools launched in December 2025. Every published paper predates the Claude Code and Codex revolution.
Key facts
- No productivity study exists for tools launched December 2025
- All published papers predate Claude Code/Codex era
- Ethan Mollick identified the gap on X
- Autonomous agents differ from copilot-style completion tools
- Vendor productivity claims remain unverified by independent research
Wharton professor Ethan Mollick posted on X that no rigorous productivity study has been conducted on the autonomous coding tools that emerged in December 2025. [According to @emollick] "We have, as far as I can tell, no good tests of the productivity impact of the autonomous coding tools that appeared starting in December 2025."
Mollick explicitly stated that "every paper out there is from prior to the Claude Code/Codex revolution," calling this "a huge gap in our knowledge about what is happening in coding."
The absence of post-December 2025 productivity studies is striking given the rapid adoption of these tools. Claude Code and OpenAI's Codex agent both launched in late 2025, promising fully autonomous code generation, debugging, and deployment — capabilities far beyond the copilot-style completions studied in prior literature.
Existing productivity research on AI coding tools — such as GitHub's 2023 study showing 55% faster task completion with Copilot, or Microsoft's 2024 paper on developer satisfaction — all predate the autonomous agent paradigm. Those studies measured human-in-the-loop code completion, not AI agents independently executing multi-step programming tasks.
This creates a dangerous feedback loop: companies are deploying autonomous coding tools at scale without controlled experiments measuring their actual impact on code quality, bug rates, maintainability, or developer throughput. The vendor claims of 2-3x productivity gains remain unverified by independent academic or industry research.
Why this gap matters more than the press releases suggest
The unique take here is structural: the autonomous coding tools represent a paradigm shift from copilot to agent, yet the measurement methodology has not evolved. The 2023-2024 studies are not just outdated — they measure a fundamentally different interaction model. Deploying tools without post-revolution productivity evidence means engineering leaders are making billion-dollar infrastructure decisions based on extrapolation, not data.
Key Takeaways
- No productivity studies exist for autonomous coding tools launched December 2025.
- All research predates the Claude Code/Codex revolution, creating a major knowledge gap.
What to watch

Watch for any controlled study from academic labs (MIT, Stanford, Wharton) or internal Microsoft/GitHub research teams publishing post-December 2025 productivity benchmarks. A peer-reviewed paper with SWE-Bench or HumanEval scores would be the first signal, but a true productivity study requires longitudinal developer observation.









