Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Two AI model logos side by side, one labeled GPT-5 and the other Claude, on a dark background with code snippets…
AI ResearchBreakthroughScore: 100

MirrorCode Rebuilds Programs from Behavior Alone, Beats GPT-4o by 37%

Epoch AI's MirrorCode reconstructs programs from I/O behavior alone, scoring 67.3% on SWE-bench—37% above GPT-4o—without source code or traces.

·3d ago·3 min read··4 views·AI-Generated·Report error
Share:
Source: news.google.comvia epoch_ai_gradient_updates_gnMulti-Source
What is MirrorCode and how does it reconstruct programs from behavior alone?

Epoch AI's MirrorCode reconstructs entire programs from input-output behavior alone, scoring 67.3% pass@1 on SWE-bench—37% higher than GPT-4o's 49.1%—without source code or runtime traces.

TL;DR

MirrorCode reconstructs programs from input-output pairs. · Achieves 67.3% pass@1 on SWE-bench, up from 49.1%. · Zero-shot, no source code or runtime traces required.

Epoch AI's MirrorCode reconstructs entire programs from input-output behavior alone, scoring 67.3% pass@1 on SWE-bench. The zero-shot system outperforms GPT-4o by 37% without source code or runtime traces.

Key facts

  • 67.3% pass@1 on SWE-bench for MirrorCode.
  • 37% absolute improvement over GPT-4o's 49.1%.
  • 500 tasks from real-world repositories.
  • Zero-shot, no source code or traces required.
  • Released June 30, 2026 by Epoch AI.

Epoch AI released MirrorCode on June 30, 2026 [According to the Epoch AI announcement], a benchmark and method for rebuilding complete programs from only their observable input-output behavior. The system achieves 67.3% pass@1 on SWE-bench (a 37% absolute improvement over GPT-4o's 49.1%), operating zero-shot—no source code, runtime traces, or intermediate representations are provided.

How the Reconstruction Works

Can the Updated GPT-4o Really Beat GPT-4.5?

MirrorCode treats program reconstruction as a sequence prediction problem from I/O examples. The model receives a set of (input, output) pairs and must generate the full source code that produces the observed behavior. The benchmark includes 500 tasks drawn from real-world software repositories, each requiring the model to infer the program's logic without any direct code access. Epoch AI's evaluation harness measures exact match on the reconstructed code against the original.

Comparison to Prior Work

Existing approaches to program synthesis (e.g., DeepCoder, OpenAI's Codex) typically require partial code sketches, natural language descriptions, or runtime traces. MirrorCode's constraint—behavior-only reconstruction—is strictly harder. The 37% gap over GPT-4o on SWE-bench underscores the gap between general-purpose code generation and targeted behavioral inverse engineering. However, the benchmark tasks are drawn from open-source repos, so training data contamination cannot be ruled out—Epoch AI has not released a contamination analysis.

Implications for Software Engineering

Strange behavior with GPT-4 limits showin…

If MirrorCode generalizes to proprietary or obfuscated binaries, it could reshape reverse engineering, legacy code migration, and black-box system understanding. Google, which has invested heavily in code AI through Gemini 3 Pro and its ADK Go agent framework, may integrate similar techniques. The method also raises security questions: behavior-only reconstruction could be used to clone closed-source software without access to source code. Epoch AI has not disclosed compute costs or model architecture details beyond noting it builds on a fine-tuned LLM.

What to watch

Watch for Epoch AI's contamination analysis release and whether Google or other labs replicate MirrorCode's results on proprietary codebases. A follow-up evaluation on obfuscated binaries would test the method's practical limits.


Source: news.google.com


Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

MirrorCode's behavioral reconstruction flips the script on program synthesis. Most code AI research assumes partial access—comments, function signatures, runtime logs. MirrorCode proves that I/O pairs alone contain enough signal to reconstruct entire programs with high fidelity. The 37% gap over GPT-4o is striking, but it's unclear how much is due to the benchmark's structure versus a fundamentally better approach. The tasks are from open-source repos; if GPT-4o was trained on similar code, the comparison may be unfair. Still, the zero-shot constraint is a genuine advance—it mirrors real-world reverse engineering scenarios where only behavior is observable. The security implications are non-trivial: behavior-only reconstruction could erode the protection of binary-only software distribution. Google's investment in code AI and its ADK Go framework suggest they may adopt similar techniques for legacy code migration or debugging. The lack of compute or architecture details is a gap—without them, reproducibility is limited.
Compare side-by-side
MirrorCode vs SWE-Bench

Mentioned in this article

Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all