What is MLPerf Training 6.0?

It's the latest peer-reviewed industry benchmark suite for evaluating AI training performance, run by MLCommons.

How does GB300 NVL72 compare to GB200 NVL72?

GB300 NVL72 delivered up to 1.6x faster training at the same scale, using NVFP4 precision and higher power ceiling.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

A data center filled with rows of NVIDIA Blackwell GPU servers, blue indicator lights glowing on the hardware…

AI ResearchBreakthroughScore: 100

NVIDIA Blackwell Sweeps MLPerf Training 6.0, GB300 Hits 1.6x Speedup

NVIDIA Blackwell swept MLPerf Training 6.0 across all seven benchmarks. GB300 NVL72 delivered 1.6x speedup over GB200 NVL72 using NVFP4 and 8,192 GPUs.

AAAla SMITH & AI Research Desk·Jun 16, 2026·3 min read··193 views·AI-Generated·Report error

Source: blogs.nvidia.comvia nvidia_dc_blog, gn_gpu_clusterWidely Reported

Did NVIDIA Blackwell win MLPerf Training 6.0?

NVIDIA Blackwell platform swept MLPerf Training 6.0, winning all seven benchmarks. The GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72, using NVFP4 precision and 8,192 GPUs for DeepSeek-V3 671B.

TL;DR

NVIDIA Blackwell won all 7 MLPerf Training 6.0 benchmarks. · GB300 NVL72 delivered up to 1.6x faster training than GB200. · DeepSeek-V3 671B trained on 8,192 GPUs via NVLink.

NVIDIA Blackwell swept all seven benchmarks in MLPerf Training 6.0. The GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72, using NVFP4 precision across 8,192 GPUs.

Key facts

NVIDIA won all 7 benchmarks in MLPerf Training 6.0.
GB300 NVL72 achieved up to 1.6x faster training than GB200 NVL72.
deepseek-v3-671b" class="entity-chip">DeepSeek-V3 671B trained on 8,192 GPUs via NVLink.
New MoE workloads DeepSeek-V3 671B and GPT-OSS-20B added.
NVFP4 precision used for Nemotron 3 Ultra 550B-parameter model.

NVIDIA's Blackwell platform dominated MLPerf Training 6.0, the latest peer-reviewed industry benchmark suite for AI training performance, according to NVIDIA's blog post. The platform achieved the fastest time to train on every benchmark, including two new mixture-of-experts (MoE) pretraining workloads: DeepSeek-V3 671B and GPT-OSS-20B. NVIDIA was the only platform with submissions across all seven benchmarks in the suite.

The standout result came from the GB300 NVL72 rack-scale system, which delivered up to 1.6x faster training than the GB200 NVL72 at the same scale. Key Blackwell Ultra capabilities driving this improvement include higher compute density with NVFP4 precision, expanded memory capacity, and a higher power ceiling that lets the GPU sustain peak performance. NVIDIA also showcased NVFP4 training methods that increase performance while meeting strict accuracy requirements across large- and small-scale pretraining as well as fine-tuning workloads.

MoE Training at Scale

Large-scale MoE training faces the same all-to-all communication challenge as MoE inference — tokens must be routed across GPUs to reach the right expert subnetwork. NVIDIA's fifth-generation NVLink Switches connect all 72 GPUs within each rack-scale system with high bandwidth into a unified pool of compute and memory, enabling them to act as one giant GPU. [According to NVIDIA], this NVLink bandwidth advantage is what makes MoE training fast and efficient at scale.

To support distributed training at scale, NVIDIA offers two complementary scale-out networking platforms — NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet — giving data centers flexibility to build large-scale clusters optimized for their infrastructure. On DeepSeek-V3 671B, NVIDIA submitted results using 8,192 GPUs, the largest Blackwell cluster in MLPerf Training history.

Historical Context and Competition

This sweep comes as NVIDIA faces increasing competition from custom silicon and alternative architectures. Google's TPU v6, AMD's MI400, and Cerebras CS-3 have all posted competitive results in previous MLPerf rounds. However, NVIDIA's ability to deliver both the fastest single-system performance and the largest-scale distributed training results — while being the only vendor to submit across all benchmarks — reinforces its dominant position in AI training infrastructure.

The GB300 NVL72's 1.6x speedup over the GB200 NVL72 is particularly notable given that Blackwell was only introduced in early 2026. This rapid generational improvement suggests NVIDIA's engineering cadence remains aggressive, likely driven by Jensen Huang's directive to maintain a one-year architecture cycle.

What to Watch

Watch for the MLPerf Inference 7.0 results expected in Q4 2026, where NVIDIA will face pressure from AMD's MI400 and Google's TPU v6 on latency-sensitive workloads. Also monitor whether CoreWeave or other cloud providers can replicate NVIDIA's 8,192-GPU DeepSeek-V3 training result on their own clusters, which would validate the scalability claims independently.

Watch NVIDIA CEO Jensen Huang's GTC Taipei Keynote Replay

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

Industrial Software Leaders Build Secure, Autonomous AI Engineers With NVIDIA NemoClaw

Source: blogs.nvidia.com

Key Takeaways

NVIDIA Blackwell swept MLPerf Training 6.0 across all seven benchmarks.
GB300 NVL72 delivered 1.6x speedup over GB200 NVL72 using NVFP4 and 8,192 GPUs.

Sources cited in this article

NVIDIA

Source: gentic.news · Jun 16, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

NVIDIA's MLPerf sweep is less about raw performance and more about ecosystem lock-in. The 1.6x GB300 speedup over GB200 is impressive, but the real story is that NVIDIA was the only vendor to submit across all seven benchmarks. This breadth matters because MLPerf submissions require significant engineering effort — each benchmark demands custom kernel tuning, distributed strategy optimization, and validation. Competitors like AMD and Google cherry-pick workloads where their architectures shine, while NVIDIA demonstrates that its platform handles the full spectrum of training workloads from small fine-tuning to massive MoE pretraining. The NVFP4 precision story is particularly interesting. By reducing precision to 4-bit floating point while maintaining accuracy, NVIDIA effectively increases throughput without requiring larger clusters. This is a direct response to the compute-constrained environment where frontier models like DeepSeek-V3 require tens of thousands of GPUs. If NVFP4 can be generalized to other architectures, it could reshape training economics. However, the lack of independent validation is a gap. NVIDIA's results come from its own labs and cloud partners. Third-party verification from CoreWeave or Google Cloud would strengthen the claims. The DeepSeek-V3 671B result on 8,192 GPUs is the most impressive technical achievement here — MoE training at that scale requires solving complex routing and load-balancing problems that most vendors avoid.

#blackwell #ai training #mlperf #benchmarks #nvidia

This story is part of

The AI Infrastructure War Shifts from Chips to Developer Tools

Nvidia's enterprise pivot and AWS's OpenAI bet collide with Cursor's quiet ascent

Compare side-by-side

Blackwell vs GB300 NVL72

→

Mentioned in this article

Nvidia Blackwell GB300 NVL72 MLPerf Training 6.0 NVFP4 GB200 NVL72 DeepSeek-V3 671B GPT-OSS-20B Nemotron 3 Ultra 550B

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches3 shared topics

Nvidia Vows 'Giant Amounts' of Vera Rubin as Blackwell Delays Bite

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

NVIDIA Blackwell Sweeps MLPerf Training 6.0, GB300 Hits 1.6x Speedup

MoE Training at Scale

Historical Context and Competition

What to Watch

Key Takeaways

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Intel Targets Nvidia, AMD with New AI Chip Launch by End 2026

NVIDIA NVFP4 on Blackwell Cuts JAX Training by 1.8x in MaxText

95% of Announced Nvidia Blackwell GPUs Yet to Deploy

Nvidia Vera Rubin Shifts AI Strategy Beyond Raw GPU Speed

Japan to Buy 27,500 Nvidia Rubin Chips for Robot AI

Nvidia Vows 'Giant Amounts' of Vera Rubin as Blackwell Delays Bite

The framework underneath this story

More in AI Research

Researcher's Word Worm Hijacks Microsoft Copilot; Fix Eludes 144 Days

OpenAI's Astra Solves 10 Open Math Problems, Costs $2K

Frontis-MA1 35B Beats GPT-5.5 on MLE-Bench Lite