NVIDIA has released Nemotron-Cascade 2, an open-weight large language model (LLM) with a Mixture-of-Experts (MoE) architecture. The model totals 30 billion parameters but activates only 3 billion during inference, a design focused on maximizing "intelligence density" for reasoning and agentic tasks. It is the second open-weight LLM to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals.
Targeted Performance and Strategic Trade-offs
The model is engineered for specialized performance in mathematical reasoning, coding, alignment, and instruction following. According to the source, it is not a "blanket win" across all benchmarks but excels in targeted categories compared to recent models like Qwen3.5-35B-A3B (released February 2026) and the larger Nemotron-3-Super-120B-A12B.
Key benchmark results include:
AIME 2025 92.4 91.9 HMMT Feb25 94.6 89.0 LiveCodeBench v6 87.2 74.6 IOI 2025 439.28 348.6+ ArenaHard v2 83.5 65.4+ IFBench 82.9 70.2Technical Architecture: Cascade RL and Multi-domain On-Policy Distillation (MOPD)
The model's capabilities stem from a post-training pipeline applied to the Nemotron-3-Nano-30B-A3B-Base model.

1. Supervised Fine-Tuning (SFT)
The SFT phase used a curated dataset packed into sequences of up to 256K tokens. This dataset included:
- 1.9 million Python reasoning traces and 1.3 million Python tool-calling samples for competitive coding.
- 816,000 samples for mathematical natural language proofs.
- A specialized Software Engineering (SWE) blend of 125,000 agentic and 389,000 agentless samples.
2. Cascade Reinforcement Learning
Following SFT, the model underwent Cascade RL, a sequential, domain-wise training process designed to prevent catastrophic forgetting. This pipeline includes stages for:
- Instruction-following (IF-RL)
- Multi-domain RL
- RLHF
- Long-context RL
- Specialized Code and SWE RL
3. Multi-Domain On-Policy Distillation (MOPD)
A key innovation is the integration of MOPD during Cascade RL. This technique uses the best-performing intermediate "teacher" models—derived from the same SFT initialization—to provide a dense token-level distillation advantage. The advantage is defined mathematically as:
$$a_{t}^{MOPD}=log~\pi^{domain_{t}}(y_{t}|s_{t})-log~\pi^{train}(y_{t}|s_{t})$$
Availability and Context
The model has been released and open-sourced on the Hugging Face Hub. This announcement follows recent NVIDIA news, including CEO Jensen Huang's mandate for engineers to spend 50% of their salary on AI inference tokens to drive productivity and the announcement of OpenClaw software.






