NVIDIA Releases Nemotron-Cascade 2: A 30B MoE Model with 3B Active Parameters, Achieves Gold Medal on IMO 2025

NVIDIA has open-sourced Nemotron-Cascade 2, a 30B parameter Mixture-of-Experts model with 3B active parameters. It achieves Gold Medal-level performance on the 2025 International Mathematical Olympiad and leads in coding benchmarks like LiveCodeBench v6.

1d ago·3 min read·12 views·via marktechpost

NVIDIA has released Nemotron-Cascade 2, an open-weight large language model (LLM) with a Mixture-of-Experts (MoE) architecture. The model totals 30 billion parameters but activates only 3 billion during inference, a design focused on maximizing "intelligence density" for reasoning and agentic tasks. It is the second open-weight LLM to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals.

Targeted Performance and Strategic Trade-offs

The model is engineered for specialized performance in mathematical reasoning, coding, alignment, and instruction following. According to the source, it is not a "blanket win" across all benchmarks but excels in targeted categories compared to recent models like Qwen3.5-35B-A3B (released February 2026) and the larger Nemotron-3-Super-120B-A12B.

Key benchmark results include:

AIME 2025 92.4 91.9 HMMT Feb25 94.6 89.0 LiveCodeBench v6 87.2 74.6 IOI 2025 439.28 348.6+ ArenaHard v2 83.5 65.4+ IFBench 82.9 70.2

Technical Architecture: Cascade RL and Multi-domain On-Policy Distillation (MOPD)

The model's capabilities stem from a post-training pipeline applied to the Nemotron-3-Nano-30B-A3B-Base model.

1. Supervised Fine-Tuning (SFT)
The SFT phase used a curated dataset packed into sequences of up to 256K tokens. This dataset included:

1.9 million Python reasoning traces and 1.3 million Python tool-calling samples for competitive coding.
816,000 samples for mathematical natural language proofs.
A specialized Software Engineering (SWE) blend of 125,000 agentic and 389,000 agentless samples.

2. Cascade Reinforcement Learning
Following SFT, the model underwent Cascade RL, a sequential, domain-wise training process designed to prevent catastrophic forgetting. This pipeline includes stages for:

Instruction-following (IF-RL)
Multi-domain RL
RLHF
Long-context RL
Specialized Code and SWE RL

3. Multi-Domain On-Policy Distillation (MOPD)
A key innovation is the integration of MOPD during Cascade RL. This technique uses the best-performing intermediate "teacher" models—derived from the same SFT initialization—to provide a dense token-level distillation advantage. The advantage is defined mathematically as:

$$a_{t}^{MOPD}=log~\pi^{domain_{t}}(y_{t}|s_{t})-log~\pi^{train}(y_{t}|s_{t})$$

Availability and Context

The model has been released and open-sourced on the Hugging Face Hub. This announcement follows recent NVIDIA news, including CEO Jensen Huang's mandate for engineers to spend 50% of their salary on AI inference tokens to drive productivity and the announcement of OpenClaw software.

AI Analysis

Nemotron-Cascade 2 represents a focused engineering effort to push the performance-per-parameter frontier in specialized domains, particularly reasoning and coding. The 30B total / 3B active parameter MoE design is a direct counter to the trend of ever-larger dense models, offering a more efficient inference profile for targeted high-stakes tasks. The reported benchmarks suggest it has carved out a leading position among similarly sized open models in its chosen domains. The technical pipeline is notable for its complexity and domain-specific tuning. The Cascade RL approach, which applies reinforcement learning sequentially across different task families, is a pragmatic method to build a multi-talented model without catastrophic forgetting. The integration of Multi-domain On-Policy Distillation (MOPD) is an interesting technical detail; it essentially allows the model to distill knowledge from its own best-performing checkpoints during training, creating a more efficient self-improvement loop. Practitioners should examine whether this specialized training pipeline, rather than just the architecture, is the primary driver of its benchmark success. For the open-source community and developers, the release provides a high-performance, efficient model for agentic and reasoning workloads. Its strong showing on LiveCodeBench and IOI makes it a compelling candidate for coding assistants and competitive programming tools. However, the source's caveat that it is not a 'blanket win' is crucial; its general conversational or knowledge-based performance may lag behind more balanced models of similar size. The true test will be its performance in integrated, real-world agent systems beyond isolated benchmarks.

Original sourcemarktechpost.com

#open source #research #llm #nvidia

Enjoyed this article?

Get notified when we launch our newsletter

Trending Now

Opinion & Analysis

Terence Tao on AI and Mathematics: Collaboration, Not Replacement, for Problems Like Riemann Hypothesis

Fields Medalist Terence Tao suggests AI may not solve problems like the Riemann Hypothesis alone, but through a 'collaboration we can't yet imagine' b...

@kimmonismus·4h ago·3 min read·2 views

mathematicsai researchtheorem proving

AI Research

100

Retrieval-Augmented LLM Agents: Combined Fine-Tuning and Experience Retrieval Boosts Unseen Task Generalization

Researchers propose a pipeline integrating supervised fine-tuning with in-context experience retrieval for LLM agents. The combined approach significa...

arxiv_ai·1d ago·3 min read·58 views

researchai agentslarge language models

AI Research

Research Paper 'Can AI Agents Agree?' Finds LLM-Based Groups Fail at Simple Coordination

A new study demonstrates that groups of LLM-based AI agents cannot reliably reach consensus on simple decisions, with failure rates increasing with gr...

@rohanpaul_ai·5h ago·3 min read·14 views

researchlarge-language-modelsmulti-agent

NVIDIA Releases Nemotron-Cascade 2: A 30B MoE Model with 3B Active Parameters, Achieves Gold Medal on IMO 2025

Targeted Performance and Strategic Trade-offs

Technical Architecture: Cascade RL and Multi-domain On-Policy Distillation (MOPD)

Availability and Context

AI Analysis

Trending Now

Terence Tao on AI and Mathematics: Collaboration, Not Replacement, for Problems Like Riemann Hypothesis

Retrieval-Augmented LLM Agents: Combined Fine-Tuning and Experience Retrieval Boosts Unseen Task Generalization

Research Paper 'Can AI Agents Agree?' Finds LLM-Based Groups Fail at Simple Coordination

More in Big Tech

Walmart's Sparky Chatbot Replaces OpenAI's Instant Checkout After 3x Lower Conversion Rates

OpenAI Codex Gains Subagents, Anthropic Ships 1M Context at Standard Pricing

AWS Bedrock Agents vs. AgentCore: A Technical Guide for AI Architects