What is Leanstral 1.5?

Leanstral 1.5 is an open-source model from Mistral AI for formal verification in the Lean 4 programming language, scoring 100% on miniF2F.

How does Leanstral 1.5 catch bugs?

It scans open-source repositories for code that fails formal verification checks, finding five previously unknown bugs in 57 repos.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Open SourceBreakthroughScore: 88

Mistral's Leanstral 1.5 hits 100% on miniF2F, finds 5 real bugs

Mistral's Leanstral 1.5 scores 100% on miniF2F, solves 587 Putnam problems, and finds 5 real bugs in open-source code.

AAAla SMITH & AI Research Desk·7h ago·2 min read··7 views·AI-Generated·Report error

Source: the-decoder.comvia the_decoderSingle Source

What did Mistral's Leanstral 1.5 achieve on formal math benchmarks and code verification?

Mistral AI released Leanstral 1.5, an open-source Lean 4 model scoring 100% on miniF2F and solving 587 PutnamBench problems. It found 5 bugs in 57 open-source repos, including a Rust overflow bug.

TL;DR

100% on miniF2F formal math benchmark · Solves 587 of 672 Putnam problems · Found 5 bugs in 57 open-source repos

mistral-ai" class="entity-chip">Mistral AI released Leanstral 1.5, an open-source model that scores 100% on the miniF2F formal math benchmark. The model also found five previously unknown bugs in 57 open-source repositories.

Key facts

100% on miniF2F formal math benchmark
Solves 587 of 672 Putnam problems
Found 5 bugs in 57 open-source repos
Apache 2.0 license, available on Hugging Face
Trained with mid-training, SFT, and RL

Mistral AI released Leanstral 1.5, an open-source model (Apache 2.0) built for formal verification in the Lean 4 programming language. Lean 4 is designed to formally verify mathematical proofs and software correctness.

According to The Decoder, Mistral says the model hits 100 percent on miniF2F, a formal math benchmark covering problems from high school level up to math olympiad difficulty. On PutnamBench, which includes 672 problems from the Putnam math competition, it solves 587. On the algebra benchmarks FATE-H and FATE-X, which test master's and doctoral-level tasks in areas like group theory and ring theory, it scores top results of 87 and 34 percent.

The model was trained mainly for math, but Mistral says it also performs well at code verification. In a hands-on test, it scanned 57 open-source repositories and caught five previously unknown bugs, including an overflow bug in the Rust library varinteger. The model is available through Hugging Face and a free API. Training involved mid-training, supervised fine-tuning, and reinforcement learning.

One unique take: Formal verification as a practical bug-finding tool

Most formal verification models focus on math proofs, but Leanstral 1.5's discovery of real bugs in production Rust code shifts the narrative. The model caught an overflow bug in varinteger, a Rust library — not a toy example. This suggests that Lean 4 models, previously confined to academic math problems, can serve as practical software verification tools. Mistral's open-source release under Apache 2.0 lowers the barrier for developers to integrate formal verification into CI/CD pipelines, potentially reducing reliance on traditional fuzzing or manual audits.

What to watch

Watch for Mistral's next release — likely a larger variant of Leanstral or integration into their API. Also track whether open-source adoption of Lean 4 verification in CI/CD pipelines increases, with the first public case study from a major Rust project.

Leanstral 1.5 tops the open-source field on PutnamBench, FATE-H, and FATE-X. Only the closed-source Aleph Prover beats it on PutnamBench. | Image: Mis

Source: the-decoder.com

Sources cited in this article

The Decoder

Source: gentic.news · 7h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Leanstral 1.5's performance on miniF2F is notable but not unprecedented — Google's AlphaProof also achieved high scores. The real differentiator is the practical bug-finding capability. By catching an overflow bug in a Rust library, the model demonstrates that formal verification can move beyond math competitions into production software. This aligns with a broader trend: models like GPT-4o and Claude 3.5 have shown code analysis skills, but Leanstral 1.5's use of Lean 4's formal system provides provable correctness guarantees, not just statistical likelihood. Mistral's open-source release strategy (Apache 2.0) is a competitive move against proprietary models from OpenAI and Anthropic. While those companies offer code analysis via APIs, Leanstral 1.5 can be run locally, making it attractive for security-conscious enterprises. However, the model's training details remain vague — Mistral did not disclose dataset size or compute budget, making it hard to compare against alternatives like DeepSeek-Coder or CodeQwen. The 34% score on FATE-X (doctoral-level algebra) suggests room for improvement on harder tasks. Mistral's use of mid-training, SFT, and RL is standard for specialized models, but the lack of ablation studies means we don't know which phase contributed most to the gains.

#code-analysis #formal-verification #open-source #ai-benchmarks #mistral

Compare side-by-side

Mistral AI vs Hugging Face

→

Mentioned in this article

Leanstral 1.5 Mistral AI Lean 4 Hugging Face

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Open Source

Shopify's Catalog API Goes Self-Serve as Amazon, Meta, and Microsoft Back Its Commerce Protocol

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Open Source

View all

A close-up of dense lines of C and CUDA code on a dark screen, with a terminal window showing compilation output in…

Open Source

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.

github.com/5d ago/3 min read

open sourcecudaai models

Zhipu AI engineer points at monitor displaying GLM-5.2 ranking chart, office with coding screens visible…

Open SourceBreakthrough

100

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

Zhipu AI's GLM-5.2 ranks top-3 globally on a coding benchmark, with US engineers calling it a daily driver superior to GPT-5.5.

scmp.com/Jun 26, 2026/3 min read/Widely Reported

open sourcechinacoding

Open Source

Wan-Streamer v0.1 Cuts Audio-Visual Interaction Latency to 200ms in Single

Wan-Streamer v0.1 achieves 200ms model-side latency in a single Transformer for full-duplex audio-visual interaction, eliminating cascaded modules. The paper lacks parameter count and benchmark comparisons, limiting reproducibility.

arxiv.org/Jun 25, 2026/3 min read

real-time systemsmultimodal modelsai research

One unique take: Formal verification as a practical bug-finding tool

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

How to Write a CLAUDE.md for FastAPI That Stops AI-Generated Code Inconsistency

Caliper: Run Your Claude Code Skills k Times and Get a pass@k Score That

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

MCP Server Versioning: How to Avoid Breaking All Your AI Clients (Like I

5 Harness Internals That Changed How I Use Claude Code Daily

Shopify's Catalog API Goes Self-Serve as Amazon, Meta, and Microsoft Back Its Commerce Protocol

The framework underneath this story

More in Open Source

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

Wan-Streamer v0.1 Cuts Audio-Visual Interaction Latency to 200ms in Single