Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Mistral AI logo on a dark gradient background with abstract geometric patterns representing AI and mathematics
Open SourceBreakthroughScore: 88

Mistral's Leanstral 1.5 hits 100% on miniF2F, finds 5 real bugs

Mistral's Leanstral 1.5 scores 100% on miniF2F, solves 587 Putnam problems, and finds 5 real bugs in open-source code.

·7h ago·2 min read··7 views·AI-Generated·Report error
Share:
Source: the-decoder.comvia the_decoderSingle Source
What did Mistral's Leanstral 1.5 achieve on formal math benchmarks and code verification?

Mistral AI released Leanstral 1.5, an open-source Lean 4 model scoring 100% on miniF2F and solving 587 PutnamBench problems. It found 5 bugs in 57 open-source repos, including a Rust overflow bug.

TL;DR

100% on miniF2F formal math benchmark · Solves 587 of 672 Putnam problems · Found 5 bugs in 57 open-source repos

mistral-ai" class="entity-chip">Mistral AI released Leanstral 1.5, an open-source model that scores 100% on the miniF2F formal math benchmark. The model also found five previously unknown bugs in 57 open-source repositories.

Key facts

  • 100% on miniF2F formal math benchmark
  • Solves 587 of 672 Putnam problems
  • Found 5 bugs in 57 open-source repos
  • Apache 2.0 license, available on Hugging Face
  • Trained with mid-training, SFT, and RL

Mistral AI released Leanstral 1.5, an open-source model (Apache 2.0) built for formal verification in the Lean 4 programming language. Lean 4 is designed to formally verify mathematical proofs and software correctness.

According to The Decoder, Mistral says the model hits 100 percent on miniF2F, a formal math benchmark covering problems from high school level up to math olympiad difficulty. On PutnamBench, which includes 672 problems from the Putnam math competition, it solves 587. On the algebra benchmarks FATE-H and FATE-X, which test master's and doctoral-level tasks in areas like group theory and ring theory, it scores top results of 87 and 34 percent.

The model was trained mainly for math, but Mistral says it also performs well at code verification. In a hands-on test, it scanned 57 open-source repositories and caught five previously unknown bugs, including an overflow bug in the Rust library varinteger. The model is available through Hugging Face and a free API. Training involved mid-training, supervised fine-tuning, and reinforcement learning.

One unique take: Formal verification as a practical bug-finding tool

Most formal verification models focus on math proofs, but Leanstral 1.5's discovery of real bugs in production Rust code shifts the narrative. The model caught an overflow bug in varinteger, a Rust library — not a toy example. This suggests that Lean 4 models, previously confined to academic math problems, can serve as practical software verification tools. Mistral's open-source release under Apache 2.0 lowers the barrier for developers to integrate formal verification into CI/CD pipelines, potentially reducing reliance on traditional fuzzing or manual audits.

What to watch

Watch for Mistral's next release — likely a larger variant of Leanstral or integration into their API. Also track whether open-source adoption of Lean 4 verification in CI/CD pipelines increases, with the first public case study from a major Rust project.

Leanstral 1.5 tops the open-source field on PutnamBench, FATE-H, and FATE-X. Only the closed-source Aleph Prover beats it on PutnamBench. | Image: Mis


Source: the-decoder.com


Sources cited in this article

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Leanstral 1.5's performance on miniF2F is notable but not unprecedented — Google's AlphaProof also achieved high scores. The real differentiator is the practical bug-finding capability. By catching an overflow bug in a Rust library, the model demonstrates that formal verification can move beyond math competitions into production software. This aligns with a broader trend: models like GPT-4o and Claude 3.5 have shown code analysis skills, but Leanstral 1.5's use of Lean 4's formal system provides provable correctness guarantees, not just statistical likelihood. Mistral's open-source release strategy (Apache 2.0) is a competitive move against proprietary models from OpenAI and Anthropic. While those companies offer code analysis via APIs, Leanstral 1.5 can be run locally, making it attractive for security-conscious enterprises. However, the model's training details remain vague — Mistral did not disclose dataset size or compute budget, making it hard to compare against alternatives like DeepSeek-Coder or CodeQwen. The 34% score on FATE-X (doctoral-level algebra) suggests room for improvement on harder tasks. Mistral's use of mid-training, SFT, and RL is standard for specialized models, but the lack of ablation studies means we don't know which phase contributed most to the gains.
Compare side-by-side
Mistral AI vs Hugging Face
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Open Source

View all