Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A programmer at a computer screen displays an evolutionary algorithm interface, surrounded by flowcharts and code…

Evolver: How AI-Driven Evolution Is Creating GPT-5-Level Performance Without Training

Imbue's newly open-sourced Evolver tool uses LLMs to automatically optimize code and prompts through evolutionary algorithms, achieving 95% on ARC-AGI-2 benchmarks—performance comparable to hypothetical GPT-5.2 models. This approach eliminates the need for gradient descent while dramatically reducing optimization costs.

AAAla SMITH & AI Research Desk·Feb 28, 2026·4 min read··337 views·AI-Generated·Report error

Source: x.comvia @LiorOnAISingle Source

Evolver: AI That Evolves Itself Through Targeted Mutation

In a significant development for artificial intelligence optimization, AI research company Imbue has open-sourced Evolver—a tool that applies evolutionary algorithms powered by large language models to automatically improve code and prompts. The system has demonstrated remarkable capabilities, achieving 95% accuracy on the challenging ARC-AGI-2 benchmark, a performance level that researchers compare to what might be expected from a hypothetical GPT-5.2 model.

How Evolutionary AI Works

Evolver operates on principles inspired by biological evolution but enhanced with artificial intelligence. The system requires three components: starting code or a prompt, a scoring mechanism to evaluate results, and an LLM capable of suggesting improvements. Unlike traditional evolutionary algorithms that rely on random mutations, Evolver employs LLMs to propose targeted fixes based on observed failures.

"Evolver works like natural selection for code," explains the announcement from Imbue. "You give it three things: starting code or prompt, a way to score results, and an LLM that suggests improvements. Then it runs in a loop. It picks high-scoring solutions, mutates them, tests the mutations, and keeps what works."

The key innovation lies in how the system learns from failure. When a solution fails on specific inputs, the LLM analyzes those failures and suggests modifications to address them. While most suggestions prove ineffective, the occasional successful mutation becomes the foundation for subsequent generations of improvements.

Technical Innovations and Efficiency Gains

Evolver incorporates several sophisticated optimizations that distinguish it from simpler evolutionary approaches:

Batch mutations allow the system to address multiple failures simultaneously rather than iterating through problems one at a time. Learning logs enable discoveries made in one evolutionary branch to inform improvements in others, creating a form of cross-pollination between solution paths. Perhaps most significantly, post-mutation filters screen out unpromising mutations before they undergo costly scoring processes.

"The verification step alone cuts costs 10x," notes the announcement, highlighting the economic advantage of this approach over brute-force testing of every mutation.

Applications Beyond Traditional Optimization

The versatility of Evolver stems from its minimal requirements: any problem where LLMs can comprehend the code and where outputs can be scored becomes a candidate for optimization. This opens up numerous applications:

Agentic workflows: Complex multi-step processes involving multiple AI agents can be systematically improved
Prompt templates: The perennial challenge of prompt engineering becomes automated through evolutionary refinement
Code performance: Existing codebases can be optimized for speed, memory usage, or other metrics
Reasoning chains: Logical sequences and decision-making processes can be refined for accuracy and efficiency

Perhaps most remarkably, this optimization occurs without traditional machine learning techniques. "No gradient descent needed. No differentiable functions required," emphasizes the announcement, highlighting how Evolver bypasses some of the most computationally expensive aspects of conventional AI improvement.

Benchmark Performance and Implications

The 95% score on ARC-AGI-2 (Abstraction and Reasoning Corpus for AGI) represents a breakthrough in automated reasoning. This benchmark, designed to test abstract reasoning capabilities similar to human intelligence, has proven challenging for even advanced AI systems. Achieving near-perfect performance through evolutionary optimization rather than massive model training suggests new pathways toward artificial general intelligence.

This performance level, described as "GPT-5.2-level" by researchers, indicates that sophisticated optimization of existing models may yield capabilities previously thought to require next-generation architectures. The implication is profound: we may be able to extract significantly more capability from current AI models than previously realized through intelligent optimization techniques.

The Future of AI Development

Evolver's open-source release democratizes access to advanced AI optimization techniques that were previously confined to well-resourced research organizations. By making this technology publicly available, Imbue enables broader experimentation with evolutionary approaches to AI improvement.

The system represents a shift toward what might be called "meta-optimization"—using AI to improve AI systems themselves. This recursive improvement loop, where each generation of AI helps create better versions of itself, accelerates progress while potentially reducing computational costs compared to training ever-larger models from scratch.

As AI systems become increasingly complex, tools like Evolver that can automatically refine and optimize these systems will become essential components of the AI development toolkit. The combination of evolutionary algorithms with large language models creates a powerful synergy that may define the next phase of artificial intelligence advancement.

Source: Imbue announcement via @LiorOnAI on X

Source: gentic.news · Feb 28, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Evolver represents a significant conceptual shift in how we approach AI optimization. Rather than viewing improvement as primarily a function of model scale or training data quantity, this approach demonstrates that intelligent optimization of existing systems can yield dramatic performance gains. The 95% score on ARC-AGI-2 is particularly noteworthy because this benchmark tests abstract reasoning capabilities that have proven challenging for even the most advanced AI systems. The economic implications are substantial. By reducing optimization costs by an order of magnitude through verification filtering and eliminating the need for gradient descent, Evolver makes sophisticated AI improvement accessible to organizations without massive computational resources. This could accelerate AI adoption and innovation across industries. Perhaps most importantly, Evolver exemplifies a growing trend toward AI systems that improve themselves. This recursive self-improvement capability, when combined with the system's open-source availability, could lead to rapid, distributed advancement in AI capabilities. The approach suggests that we may be entering an era where how we optimize AI systems becomes as important as the underlying architectures themselves.

#open source #machine learning #ai research

Compare side-by-side

Evolver vs ARC-AGI-2

→

Mentioned in this article

Imbue Evolver GPT-5.3 ARC-AGI-2

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research

Google’s Virgo network interconnects 134K TPUv8t chips at 47 Pbps

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in AI Research

View all

Smartphone displaying LLaDA-8B inference interface with latency reduction metrics, NPU chip schematic overlay

AI Research

llada.cpp Cuts LLaDA-8B Latency 17-42x on Mobile NPU

llada.cpp, the first NPU-aware dLLM inference framework, cuts LLaDA-8B latency 17-42x on smartphones, enabling real-time on-device generation.

arxiv.org/3h ago/3 min read

ai inferencemobile hardwarediffusion models

AI Research

Mirage Probes Paper Reveals Two Distinct VLM Failure Modes

Mirage Probes paper reveals VLMs have two distinct failure modes—textual biases and spurious images—requiring different mitigations. Text cleaning only fixes one; the other needs representational interventions.

arxiv.org/3h ago/3 min read

ai safetycomputer visionresearch