Epoch AI's MirrorCode reconstructs entire programs from input-output behavior alone, scoring 67.3% pass@1 on SWE-bench. The zero-shot system outperforms GPT-4o by 37% without source code or runtime traces.
Key facts
- 67.3% pass@1 on SWE-bench for MirrorCode.
- 37% absolute improvement over GPT-4o's 49.1%.
- 500 tasks from real-world repositories.
- Zero-shot, no source code or traces required.
- Released June 30, 2026 by Epoch AI.
Epoch AI released MirrorCode on June 30, 2026 [According to the Epoch AI announcement], a benchmark and method for rebuilding complete programs from only their observable input-output behavior. The system achieves 67.3% pass@1 on SWE-bench (a 37% absolute improvement over GPT-4o's 49.1%), operating zero-shot—no source code, runtime traces, or intermediate representations are provided.
How the Reconstruction Works

MirrorCode treats program reconstruction as a sequence prediction problem from I/O examples. The model receives a set of (input, output) pairs and must generate the full source code that produces the observed behavior. The benchmark includes 500 tasks drawn from real-world software repositories, each requiring the model to infer the program's logic without any direct code access. Epoch AI's evaluation harness measures exact match on the reconstructed code against the original.
Comparison to Prior Work
Existing approaches to program synthesis (e.g., DeepCoder, OpenAI's Codex) typically require partial code sketches, natural language descriptions, or runtime traces. MirrorCode's constraint—behavior-only reconstruction—is strictly harder. The 37% gap over GPT-4o on SWE-bench underscores the gap between general-purpose code generation and targeted behavioral inverse engineering. However, the benchmark tasks are drawn from open-source repos, so training data contamination cannot be ruled out—Epoch AI has not released a contamination analysis.
Implications for Software Engineering

If MirrorCode generalizes to proprietary or obfuscated binaries, it could reshape reverse engineering, legacy code migration, and black-box system understanding. Google, which has invested heavily in code AI through Gemini 3 Pro and its ADK Go agent framework, may integrate similar techniques. The method also raises security questions: behavior-only reconstruction could be used to clone closed-source software without access to source code. Epoch AI has not disclosed compute costs or model architecture details beyond noting it builds on a fine-tuned LLM.
What to watch
Watch for Epoch AI's contamination analysis release and whether Google or other labs replicate MirrorCode's results on proprietary codebases. A follow-up evaluation on obfuscated binaries would test the method's practical limits.
Source: news.google.com







