NVIDIA Releases Nemotron-Cascade 2: A 30B MoE Model with 3B Active Parameters

NVIDIA has open-sourced Nemotron-Cascade 2, a 30B parameter Mixture-of-Experts model that activates only 3B parameters per token. It claims 'gold medal performance' on IMO and IOI 2025 benchmarks.

AAAla SMITH & AI Research Desk·Mar 20, 2026·2 min read··257 views·AI-Generated·Report error

Source: x.comvia @HuggingPapersWidely Reported

What Happened

NVIDIA has released Nemotron-Cascade 2 on the Hugging Face Hub. The model is a 30 billion parameter Mixture-of-Experts (MoE) architecture that activates approximately 3 billion parameters per token during inference.

The accompanying announcement claims the model achieves "gold medal performance" on the International Mathematical Olympiad (IMO) 2025 and International Olympiad in Informatics (IOI) 2025 benchmarks. These are standardized tests designed to evaluate advanced reasoning and problem-solving capabilities in mathematics and competitive programming.

Technical Details

Based on the provided specifications:

Total Parameters: 30 billion
Activated Parameters per Token: ~3 billion (10% sparsity)
Architecture: Mixture-of-Experts (MoE)
Availability: Hugging Face Hub

Mixture-of-Experts models route each input token through a subset of expert neural networks (the "activated parameters"), allowing for a large total parameter count while maintaining manageable computational costs during inference. The 10:1 ratio of total to active parameters is a common design point for modern MoE language models.

Context

The release follows NVIDIA's previous Nemotron model family, which includes instruction-tuned and reward models aimed at synthetic data generation and code completion. Positioning a model against IMO and IOI benchmarks targets the high-end reasoning evaluation space, currently dominated by models like OpenAI's o1, DeepSeek-R1, and Google's Gemini series.

No detailed benchmark scores, methodology, or training data information was provided in the initial announcement. The model card or associated technical report should be consulted for rigorous performance comparisons.

What to Watch

Practitioners should look for:

Published benchmark results on IMO/IOI 2025 and standard coding/math benchmarks (e.g., MATH, HumanEval, SWE-Bench).
Inference performance and hardware requirements for the 30B MoE architecture.
License details and any commercial use restrictions.
Comparison against similarly sized dense models and other MoE models (like Mixtral 8x22B) on throughput and quality.

The claim of "gold medal performance" requires verification against the official IMO/IOI evaluation criteria, which typically involve multi-step reasoning and formal proof generation.

Sources cited in this article

Token

Source: gentic.news · Mar 20, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The release is notable primarily for its benchmark target—IMO and IOI are among the most challenging public benchmarks for mathematical and algorithmic reasoning, a step beyond more common evaluations like MATH or GSM8K. A 30B MoE model achieving competitive performance here would suggest highly efficient parameter utilization, as current top performers in this category (like DeepSeek-R1) are often larger dense models or utilize extensive reasoning-time computation (e.g., o1's search). However, without published scores or details on the evaluation protocol, it's impossible to assess the claim's validity. The community will need to see if the model's performance comes from specialized training on olympiad-style problems, novel architectural choices, or improved reasoning algorithms. The 3B active parameter count suggests a focus on inference efficiency, which could make advanced reasoning more deployable if the quality holds. For practitioners, the immediate question is whether this model offers a better performance/efficiency trade-off in the reasoning domain compared to alternatives like DeepSeek-Coder-V2, Claude 3.5 Sonnet, or o1-mini. The Hugging Face release should enable independent evaluation on both academic benchmarks and real-world coding/math tasks.

#reasoning #mathematics #model-release #code-generation

Compare side-by-side

Hugging Face Hub vs International Olympiad in Informatics 2025

→

Mentioned in this article

Nvidia Nemotron-Cascade 2 Mixture of Experts (Sparse MoE for LLMs)Hugging Face Hub International Olympiad in Informatics 2025

Enjoyed this article?