Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A sleek, futuristic server rack with blue glowing lights, representing Microsoft's MAI-Thinking-1 AI model, a 35B…
AI ResearchScore: 100

Microsoft Unveils MAI-Thinking-1: 35B Active, 1T Parameters, 97% on AIME 2025

Microsoft's MAI-Thinking-1 hits 97% on AIME 2025 with 35B active params in a 1T MoE model, trained on 30T human tokens without distillation.

·1d ago·3 min read··51 views·AI-Generated·Report error
Share:
What is Microsoft's MAI-Thinking-1 model and how does it perform?

Microsoft's MAI-Thinking-1 is a 35B active parameter model inside a 1T total parameter mixture-of-experts, scoring 97% on AIME 2025 and 52.8% on SWE-Bench Pro. It was trained from scratch on 30T human-generated tokens, avoiding third-party distillation.

TL;DR

Microsoft released MAI-Thinking-1 reasoning model. · 35B active params in 1T MoE architecture. · 97% on AIME 2025; 52.8% on SWE-Bench Pro.

Microsoft unveiled MAI-Thinking-1, a 35B active parameter reasoning model scoring 97% on AIME 2025. The model is the first output of what Microsoft calls a 'hill-climbing machine' — a closed-loop pipeline for iteratively improving reasoning models.

Key facts

  • MAI-Thinking-1: 35B active, 1T total MoE parameters.
  • 97.0% on AIME 2025 math benchmark.
  • 87.7% on LiveCodeBench v6 coding benchmark.
  • 52.8% on SWE-Bench Pro software engineering benchmark.
  • Trained from scratch on 30T human-generated tokens.

Microsoft has introduced MAI-Thinking-1, a reasoning model with 35 billion active parameters inside a 1 trillion total parameter mixture-of-experts (MoE) architecture. The model achieves 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro — strong scores for its active parameter count [According to @rohanpaul_ai].

The Hill-Climbing Pipeline

Microsoft AI Unveils MAI-Image-1 Ranked 9 on LMArena | by Darsh Dar…

Microsoft frames MAI-Thinking-1 as the first release from a systematic process it calls a 'hill-climbing machine.' This pipeline integrates data generation, training setup, reward design, safety testing, and evaluation into a single iterative loop. The implication: Microsoft plans to release increasingly capable reasoning models by feeding each cycle's outputs back into the next training run.

The base model was trained from scratch on 30 trillion tokens, predominantly human-generated. Microsoft explicitly states it avoided distillation from third-party models during pre-training — a notable claim given the industry's reliance on synthetic data from frontier models.

Performance and Architecture

qihoo360/TinyR1-32B-Preview · impressive perfor…

MAI-Thinking-1 uses reinforcement learning to teach math reasoning, coding, tool use, helpfulness, and safety. The MoE design activates only 35B parameters per token, keeping inference costs closer to a dense 35B model while maintaining the representational capacity of a 1T parameter system.

The unique take: Microsoft is positioning this as a reproducible process, not a one-off model. If the hill-climbing machine delivers consistent gains per cycle, Microsoft could close the gap with OpenAI and Anthropic on reasoning benchmarks without needing to match their total compute spend per model — the pipeline becomes the moat, not the checkpoint.

What to watch

Watch for the next model in Microsoft's hill-climbing pipeline, likely within 6-12 months, and whether scores on AIME and SWE-Bench Pro improve by more than 5 points. Also track whether Microsoft publishes a paper detailing the pipeline architecture — the lack of one suggests the process itself is a trade secret.

[Updated 03 Jun via simon_willison]

A technical paper accompanying the release [per Simon Willison] reveals that MAI-Thinking-1 was trained on a proprietary web crawl of 1.2 trillion pages, filtered to 794 billion pages using a UT1 block list and a proprietary AI-content detection model to remove adult content, piracy, and AI-generated text. The paper also details that Common Crawl contributed 24.2 billion pages after similar filtering and deduplication, confirming the model relies on public web data despite Microsoft's claim of "clean and commercially licensed" training material.

Sources cited in this article

  1. Simon Willison
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Microsoft's move is strategically interesting less for the model's absolute performance — 97% on AIME 2025 is good but not best-in-class — and more for the pipeline narrative. The 'hill-climbing machine' is a direct challenge to the one-off model release cadence of OpenAI and Anthropic. If the pipeline genuinely yields monotonic improvements, Microsoft could achieve a compounding advantage: each cycle's model generates better training data for the next, reducing reliance on external sources. However, the claim of avoiding third-party distillation is hard to verify and may be a competitive signal to regulators. The 30T token dataset, mostly human-generated, is a massive investment in data curation — Microsoft is betting that data quality and pipeline automation matter more than raw compute scaling. The MoE architecture at 35B active / 1T total is similar to Mixtral 8x22B but at larger total scale. The real test will be whether the hill-climbing machine can match or exceed the performance of models trained with significantly more compute, like GPT-4 or Claude 3.5, within two to three cycles.
Compare side-by-side
AIME 2025 vs LiveCodeBench v6
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all