Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

ASI-Evolve: This AI Designs Better AI Than Humans Can — 105 New Architectures, Zero Human Guidance

Researchers built an AI that runs the entire research cycle on its own — reading papers, designing experiments, running them, and learning from results. It discovered 105 architectures that beat human-designed models, and invented new learning algorithms. Open-sourced.

AAAla SMITH & AI Research Desk·Apr 5, 2026·9 min read··704 views·AI-Generated·Report error

Source: arxiv.orgvia arXivSingle Source

TL;DR

ASI-Evolve autonomously runs the full AI research loop. It found 105 architectures beating human designs and invented RL algorithms outperforming GRPO by +12.5 points.

An AI That Does AI Research By Itself

A team in Shanghai just built something that should get every AI researcher's attention.

ASI-Evolve is an AI system that does AI research — autonomously. It reads scientific papers, comes up with ideas, designs experiments, runs them, analyzes results, and uses what it learned to do better next time. Over and over. Without anyone guiding it.

They pointed it at three fundamental challenges in AI, and it outperformed humans on all three.

ASI-Evolve covers three pillars of AI: Architecture, Data, and Algorithms — with results that beat human researchers across all three.

The system is fully open-sourced on GitHub.

Challenge 1: Design a Better Neural Architecture

Human researchers spent years improving linear attention models. The best recent human gain was +0.34 points (Mamba2 over DeltaNet).

ASI-Evolve was given the same task. Over 1,773 autonomous rounds, it:

Generated 1,350 candidate architectures
Found 105 that beat the human baseline
The best one scored +0.97 points — nearly 3x the human improvement
On development benchmarks: 57.28% average accuracy vs DeltaNet's 55.76%
On generalization (out-of-distribution): 45.40% vs DeltaNet's 44.74%

The AI discovered a consistent pattern: adaptive routing that adjusts computation based on input content. Five standout architectures include PathGateFusionNet (hierarchical budget allocation), ContentSharpRouter (learnable temperature routing), and AdaMultiPathGateNet (token-level sparse gating with entropy penalties to prevent mode collapse).

The evaluation was rigorous: small models (~20M params) explored first, then promising candidates scaled to 340M params, and top architectures validated at **1.3B parameters on 100B tokens** across 16 benchmarks.

Challenge 2: Improve Training Data

Bad training data = bad models. Cleaning data at scale is tedious, expensive, and hard to get right.

ASI-Evolve designed its own data cleaning strategies for Nemotron-CC, a 672-billion-token pretraining corpus spanning math, computer science, medicine, and STEM. The results:

+18.64 points on MMLU (the standard knowledge benchmark)
+18.80 points on CSQA (commonsense reasoning)
+13.48 points on MedQA (medical knowledge)
+3.96 points average across 18 benchmarks

Same 3B-parameter model. Same 500B training tokens. The only difference was the AI-designed data curation — and it crushed every human-designed strategy including DCLM, FineWeb-Edu, and Ultra-FineWeb.

The system converged on cleaning-focused approaches without being told what to do: targeted noise removal (HTML artifacts, duplicates, PII), format normalization, and domain-aware preservation rules. The 2.93-point gap between best and worst AI strategies shows iterative refinement matters — it's not one-shot generation.

Challenge 3: Invent a Better Learning Algorithm

This is the hardest one. Designing how models learn requires deep mathematical reasoning.

ASI-Evolve was told to improve on GRPO (Group Relative Policy Optimization), the leading RL method for LLM training. Over 300 evolutionary rounds using Qwen-3-14B, it invented 10 new algorithms that beat GRPO, with the best gaining:

+12.5 points on AMC32 (67.5 to 80.0)
+11.67 points on AIME24 (20.00 to 31.67)
+5.04 points on OlympiadBench (45.92 to 50.96)

Two standout algorithms show genuine theoretical innovation:

Algorithm A (Pairwise Asymmetric Optimization) — instead of comparing against a group mean, it calculates advantage by averaging tanh-normalized pairwise reward differences. It uses an asymmetric clipping window that adjusts based on advantage sign, plus "High-Impact Gradient Dropout" that masks gradients for the most influential tokens to prevent overfitting.

Algorithm B (Budget-Constrained Dynamic Radius) — uses percentile-based normalization and a "Global Update Budget" that mathematically guarantees total policy update magnitude stays within bounds, stabilizing training on noisy data.

ASI-Evolve benchmark results: AI-discovered architectures vs human-designed baselines across development and generalization benchmarks.

It Works in Medicine Too

To test real-world transfer, they applied an ASI-Evolve architecture to drug-target interaction (DTI) prediction — a core problem in AI-driven drug discovery.

Starting from the DrugBAN architecture and initialized with ~80 papers on graph neural networks and molecular modeling, the system evolved over 100+ rounds. Results across 4 datasets (BindingDB, BioSNAP, Human, C.elegans):

+1.91 AUROC on BindingDB (94.15 to 96.06)
+6.94 AUROC for unseen drugs (79.15 to 86.09) — the cold-start scenario
+3.56 AUROC for unseen proteins (82.26 to 85.82)
Beat all 6 human-designed baselines including TransformerCPI, PSICHIC, and ColdStartCPI

The best architecture introduced three innovations: Sinkhorn Attention (optimal-transport-based attention preventing collapse), Domain-Specific Marginalization (separate aggregation over drug and protein substructures), and Top-k Sparse Gating (learnable selection focusing on relevant interaction patterns).

This proves ASI-Evolve's designs aren't just AI-benchmark tricks — they carry real scientific value.

How It Works

ASI-Evolve follows a learn-design-experiment-analyze cycle:

The ASI-Evolve pipeline: Cognition base feeds the Researcher, which generates candidates for the Engineer to run. The Analyzer distills results back into the Database.

Cognition — a knowledge base initialized with insights from ~100-150 research papers. Provides human prior knowledge so it doesn't explore blindly.
Researcher — samples context from past experiments, retrieves relevant cognition items via embedding search, generates a new candidate with a natural-language motivation.
Engineer — runs the experiment. Includes a static check agent (validates before expensive training), a debug agent (handles runtime errors), and a novelty check (filters duplicates).
Analyzer — receives the full experimental output (loss dynamics, benchmark breakdowns, efficiency traces) and distills it into a compact report. This is key — not just a scalar score, but structured feedback.
Database — stores everything: motivation, code, results, analysis. Supports multiple sampling strategies (UCB1, greedy, MAP-Elites island, random).

Ablation: What Actually Matters?

The team ran controlled ablation studies:

Without the Analyzer: The system starts well (thanks to cognition) but hits a plateau. Without structured feedback, improvements become sporadic. The Analyzer's ability to interpret multi-dimensional experimental results is critical for sustained progress.

Without the Cognition base: Cold-start is much slower. The system takes longer to find productive regions. But it still evolves — proving the core learn-experiment-analyze loop works even without human priors, just slower.

Sampling strategy matters: UCB1 (exploitation-heavy) combined with the cognition base reached SOTA on circle packing in just 17 steps. MAP-Elites (diversity-preserving) needed 79 steps for the same score. With good priors, you can be greedy.

The framework also works across different base models: GPT-5-mini and Qwen3-32B both converge to similar performance, showing the evolution capability isn't tied to a specific model family.

Speed: Fastest Evolutionary Framework

On the circle packing benchmark (a standard test for evolutionary frameworks):

AlphaEvolve Gemini 2.0 Flash + Claude 3.7 — 2.6359 OpenEvolve Gemini 2.0 Flash + Claude 3.7 460 2.6343 LoongFlow DeepSeek-R1 — 2.6360 SkyDiscover GPT-5 89 2.6360 ASI-Evolve GPT-5-mini 17 2.6360

ASI-Evolve reaches SOTA in 17 rounds — the fastest. And it uses a cheaper model (GPT-5-mini vs GPT-5 or Gemini+Claude combos).

Why This Matters

This is the first system to demonstrate AI-driven discovery across all three pillars of AI development — architecture, data, and algorithms — in a single framework.

Among the 105 winning architectures:

51.7% built on the cognition base (human prior knowledge)
38.2% emerged from accumulated experience (the system's own past experiments)
10.1% were genuinely novel

As evolution proceeds, experience-derived designs rise to 44.8% while novelty drops to 6.6% — the system progressively distills its own useful patterns.

The recursive loop is closed. AI is building AI. And on these benchmarks, it's already better at it than we are.

Caveats

Still needs human-curated initialization (100-150 papers)
Operates at the mechanism level, not hardware-optimized CUDA kernels — wall-clock efficiency of discovered architectures after full optimization is unvalidated
Each experiment costs real GPU hours (architecture search = 2000 training steps per candidate at ~20M params)
LLM-as-a-Judge scores penalize computationally expensive designs, which could bias against some viable architectures
The +0.97 architecture gain is in a high-saturation regime where any improvement is hard

Open Source

Code: github.com/GAIR-NLP/ASI-Evolve
Paper: arxiv.org/abs/2603.29640
Team: Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu — Shanghai Jiao Tong University (SJTU), SII, GAIR

Frequently Asked Questions

What is ASI-Evolve?

ASI-Evolve is an open-source agentic framework for AI-for-AI research from Shanghai Jiao Tong University. It autonomously runs the full scientific research loop — reading papers, forming hypotheses, designing experiments, executing them, and analyzing results. It discovered 105 neural architectures better than human designs, improved training data by 18 points on MMLU, and invented RL algorithms outperforming GRPO by 12.5 points on competition math.

Can ASI-Evolve replace human AI researchers?

Not yet. It needs human-curated knowledge to start (insights from 100+ papers) and operates at the mechanism design level. About 52% of its best discoveries built directly on human prior knowledge. It augments human research rather than replacing it — but the gap is narrowing.

Is ASI-Evolve open source?

Yes, fully open-sourced at github.com/GAIR-NLP/ASI-Evolve including code, cognition base, and all experimental configurations.

How does ASI-Evolve compare to AlphaEvolve and OpenEvolve?

ASI-Evolve is broader (covers architecture + data + algorithms vs code-level optimization), faster (SOTA in 17 rounds vs 460+ for OpenEvolve), and uses a cheaper model (GPT-5-mini vs Gemini+Claude combos). On circle packing, ASI-Evolve matches or exceeds all prior frameworks while being the fastest to converge.

What base models does ASI-Evolve use?

The framework works with multiple LLMs. Results were demonstrated with GPT-5-mini and Qwen3-32B, both converging to similar performance. The evolution capability is not tied to a specific model family.

Does ASI-Evolve work outside of AI research?

Yes. When applied to drug-target interaction prediction in biomedicine, an ASI-Evolve-designed architecture achieved +6.94 AUROC improvement for unseen drugs, beating all human-designed baselines across 4 benchmark datasets.

Source: gentic.news · Apr 5, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

First unified AI-for-AI system across architecture, data, and algorithms. Cognition base is the key innovation. Open-sourced.

This story is part of

Anthropic's MCP Gambit: Building a Developer Ecosystem While Rivals Stumble

Claude Code's security-first approach and Model Context Protocol create a convergence point as GitHub, OpenAI, and standalone coding tools show vulnerability.

Mentioned in this article

ASI-Evolve GAIR-NLP DeltaNet Mamba2 GitHub

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

ByteDance Finds AI Agents Double Learning Speed Every 3 Months

AI Research

Alibaba's Damo Academy AI Agent Discovers 4 New Superconductors in 28 Hours

AI Research

Mira Murati's Thinking Machines beats frontier models by 29.8% with Bridgewater's expert judgments

AI Research

Epoch AI's EBR-Bench: Top Models Score 30-50% on Experience-Based Reasoning

AI Research

Google TPU Humufish Drops TSMC CoWoS for Intel EMIB-T

AI Research

NVIDIA Blackwell Cuts DeepSeek V4 Token Costs 5x in One Month

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

ASI-Evolve: This AI Designs Better AI Than Humans Can — 105 New Architectures, Zero Human Guidance

An AI That Does AI Research By Itself

Challenge 1: Design a Better Neural Architecture

Challenge 2: Improve Training Data

Challenge 3: Invent a Better Learning Algorithm

It Works in Medicine Too

How It Works

Ablation: What Actually Matters?

Speed: Fastest Evolutionary Framework

Why This Matters

Caveats

Open Source

Frequently Asked Questions

What is ASI-Evolve?

Can ASI-Evolve replace human AI researchers?

Is ASI-Evolve open source?

How does ASI-Evolve compare to AlphaEvolve and OpenEvolve?

What base models does ASI-Evolve use?

Does ASI-Evolve work outside of AI research?

AI Analysis

✨AI Toolslive

Related Articles

ByteDance Finds AI Agents Double Learning Speed Every 3 Months

Alibaba's Damo Academy AI Agent Discovers 4 New Superconductors in 28 Hours

Mira Murati's Thinking Machines beats frontier models by 29.8% with Bridgewater's expert judgments

Epoch AI's EBR-Bench: Top Models Score 30-50% on Experience-Based Reasoning

Google TPU Humufish Drops TSMC CoWoS for Intel EMIB-T

NVIDIA Blackwell Cuts DeepSeek V4 Token Costs 5x in One Month

The framework underneath this story

More in AI Research

Hugging Face Papers: 35B Agent Matches Trillion-Parameter Performance

Alibaba's Qwen-RobotNav Unifies Robot Navigation in One 2B-8B Model

Tencent Hunyuan GEAR: 10× Faster Autoregressive Image Gen