Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Data center filled with rows of Nvidia Blackwell Ultra GPU servers, cables lit, as engineers monitor MLPerf…

Nvidia Claims MLPerf Inference v6.0 Records with 288-GPU Blackwell Ultra Systems, Highlights 2.7x Software Gains

MLCommons released MLPerf Inference v6.0 results, introducing multimodal and video model tests. Nvidia set records using 288-GPU Blackwell Ultra systems and achieved a 2.7x performance jump on DeepSeek-R1 via software optimizations alone.

AAAla SMITH & AI Research Desk·Apr 2, 2026·6 min read··577 views·AI-Generated·Report error

Source: the-decoder.comvia the_decoderWidely Reported

TL;DR

Nvidia dominates new MLPerf benchmarks with massive 288-GPU clusters, while AMD and Intel target different segments, making direct comparisons nearly impossible.

Nvidia Claims MLPerf Inference v6.0 Records with 288-GPU Blackwell Ultra Systems, Highlights 2.7x Software Gains

MLCommons' latest benchmark round expands to include multimodal and video models, but submissions from Nvidia, AMD, and Intel use different configurations and scenarios, preventing clear cross-vendor comparisons.

On April 1, 2026, MLCommons published the results for MLPerf Inference v6.0, the industry's premier benchmark suite for measuring AI inference performance. This round marks a significant expansion, introducing tests for multimodal and video generation models for the first time. While all three major chipmakers—Nvidia, AMD, and Intel—submitted results, each company highlighted different metrics and system configurations, making a straightforward performance ranking impossible. Notably, Google did not submit results for its latest Ironwood-generation TPUs, and inference specialists like Cerebras were absent.

What's New in MLPerf Inference v6.0

Version 6.0 of the benchmark suite adds five new workloads, reflecting the evolving demands of production AI systems:

DeepSeek-R1 (Interactive Scenario): Features a five-times-higher minimum token generation rate requirement compared to previous text model tests.
Qwen3-VL-235B: The suite's first multimodal vision-language model.
GPT-OSS-120B: A new large language model from OpenAI.
WAN-2.2-T2V: A text-to-video generation model.
DLRMv3: An updated transformer-based recommendation system benchmark.

Only Nvidia submitted results across all five new models and scenarios.

Nvidia's Strategy: Scale and Software

Nvidia's submissions focused on showcasing the scalability of its Blackwell Ultra architecture, with record claims primarily set using massive configurations like the GB300-NVL72 system with 288 GPUs. The company highlighted performance on the new DeepSeek-R1 and GPT-OSS-120B models.

Image description

The more telling story, however, is software. Nvidia claims a 2.7x performance jump on DeepSeek-R1 in server scenarios compared to its submission six months ago, achieved on the same hardware through software optimizations delivered by partner Nebius. The company states this cuts token production costs by over 60%. Similar software gains yielded a 1.5x improvement on the older Llama 3.1 405B model.

Key Software Optimizations

Nvidia detailed several software-level improvements driving these gains:

Operation Fusion & Speed-up: Basic compute operations were accelerated and fused to reduce GPU overhead.
Nvidia Dynamo: This open-source framework separates the prefill (input processing) and decoding (token generation) phases of text generation, optimizing each independently.
Wide Expert Parallel: For mixture-of-experts models like DeepSeek-R1, this technique distributes expert weights across more GPUs to prevent any single card from becoming a bottleneck.
Multi-Token Prediction: In interactive scenarios with small batch sizes, this method generates multiple tokens in parallel to utilize otherwise idle compute power.

AMD and Intel's Different Battles

The submissions from AMD and Intel targeted different market segments, avoiding a direct clash with Nvidia's scale.

AMD compared its performance against Nvidia's B200 and B300 GPUs in single-node, eight-GPU configurations. It did not submit results for the new DeepSeek-R1 or Qwen3-VL benchmarks, focusing its competitive claims on more constrained system sizes.
Intel aimed at the workstation GPU segment, competing in a different tier entirely from the data-center-scale systems Nvidia highlighted.

This strategic fragmentation means buyers must carefully match benchmark scenarios to their own expected deployment environments.

Key Numbers: MLPerf Inference v6.0 Highlights

Nvidia GB300-NVL72 (288 GPUs) DeepSeek-R1 (Server) 2.7x perf gain vs. 6 mo. ago (software-only) Highest throughput across new workloads; all-new model coverage. Nvidia GB300-NVL72 (288 GPUs) GPT-OSS-120B Top throughput result Showcases scale on new OpenAI model. AMD Single-Node (8 GPUs) Various (not DeepSeek-R1/Qwen3-VL) Competitive vs. Nvidia B200/B300 Focuses on smaller, single-node comparisons. Intel Workstation GPUs Various Leadership in segment Targets a different market tier entirely.

What This Means in Practice

For AI engineers, the benchmark expansion to multimodal and video models is the most useful outcome, providing new data points for complex workloads. However, the lack of standardized submissions across vendors forces teams to do extra work to translate results to their own infrastructure plans. Nvidia's demonstrated software gains—doubling throughput on unchanged hardware—underline that for existing installations, software and compiler optimizations can be as valuable as a hardware upgrade.

gentic.news Analysis

This MLPerf round continues a trend we've tracked closely: Nvidia leveraging its full-stack advantage—from silicon (Blackwell Ultra) to software (Dynamo, novel parallelism strategies)—to set performance records that competitors struggle to contest on the same terms. The 2.7x software gain on DeepSeek-R1 is particularly significant, following a week of major Nvidia software announcements, including the PivotRL framework that cut agent training costs 5.5x. It demonstrates that Nvidia's moat is as much about its CUDA software ecosystem as its transistor density.

The absence of Google's TPUs is notable. As we covered in our analysis of the TSMC 2nm capacity constraints, the AI chip landscape is facing supply pressures. Google may be prioritizing Ironwood TPU production for its internal cloud and AI services over benchmark submissions. Meanwhile, AMD's and Intel's targeted submissions reflect a pragmatic strategy: compete where you can, avoid a losing battle on Nvidia's chosen terrain of extreme scale.

This fragmentation in benchmarks mirrors the broader competitive landscape. As noted in our entity relationships, Nvidia both partners with and competes against companies like OpenAI and Meta. These MLPerf results, where Nvidia tests OpenAI's GPT-OSS-120B model, exemplify this complex dynamic. The results also arrive as Nvidia's market valuation soars past $3 trillion, driven by relentless AI infrastructure demand that these benchmarks are designed to measure.

Frequently Asked Questions

What is MLPerf Inference?

MLPerf Inference is a suite of benchmarks developed by MLCommons, an open engineering consortium, to measure the performance of AI systems when running trained models (inference). It is considered the industry standard for fair and objective performance comparisons across different hardware and software platforms.

Why can't I directly compare Nvidia's, AMD's, and Intel's MLPerf results?

The vendors submitted results using different system configurations (e.g., 288 GPUs vs. 8 GPUs), different benchmark scenarios (e.g., server vs. offline), and sometimes entirely different models. Each company optimized its submission to highlight its strengths in a specific market segment, making an apples-to-apples comparison across all their claims impossible without careful normalization.

What are the practical implications of Nvidia's 2.7x software gain?

For organizations already operating Nvidia hardware, these software optimizations—likely to be rolled out in future CUDA and framework updates—could effectively double the throughput of their existing infrastructure for models like DeepSeek-R1 without any capital expenditure on new GPUs. This translates directly to lower inference cost per token and increased capacity.

Who uses MLPerf results?

Cloud providers, hardware manufacturers, and large enterprise buyers use MLPerf data to inform purchasing decisions, validate performance claims, and guide system design. Researchers also use it to track the efficiency improvements of AI computing over time.

Source: gentic.news · Apr 2, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The MLPerf v6.0 results are less a leaderboard and more a Rorschach test of chipmaker strategy. Nvidia's submission is a flex of vertical integration, demonstrating that its real advantage lies in co-designing silicon (Blackwell Ultra), systems (288-GPU racks), and system software (Dynamo, Wide Expert Parallel) to set records on the newest, most demanding workloads like video generation and massive MoE models. The 2.7x software gain is the critical takeaway; it shows the performance lifecycle of a GPU architecture is now largely defined in software, creating a moving target for competitors who must chase both hardware and software milestones. AMD's and Intel's selective submissions are a rational, if defensive, market segmentation play. They acknowledge they cannot win on pure scale against Nvidia's full-stack, hyperscale-optimized approach. Instead, they are competing on price-performance in constrained deployments (single node) or different markets (workstations). This mirrors the broader competitive dynamics in our knowledge graph, where Nvidia simultaneously partners with and competes against cloud giants. The absence of Google TPUs and Cerebras suggests these players either cannot spare silicon for benchmarks due to supply constraints—a theme in our recent TSMC coverage—or view MLPerf as less relevant to their proprietary technology stacks and customers. For practitioners, the benchmark's expansion into multimodal and video models is its most valuable contribution, providing hard data on workloads that are becoming mainstream. However, the vendor fragmentation forces engineers to become benchmark detectives, reverse-engineering which submitted configuration actually mirrors their planned production environment. The trend is clear: raw FLOPs are no longer the story; the story is the efficiency of the entire software-hardware stack in delivering usable throughput on specific, complex models.

#ai infrastructure #hardware #benchmarks #nvidia

This story is part of

The AI Infrastructure War Shifts from Chips to Developer Tools

Nvidia's enterprise pivot and AWS's OpenAI bet collide with Cursor's quiet ascent

Compare side-by-side

Nvidia vs DeepSeek

→

Mentioned in this article

Nvidia MLPerf Inference v6.0 MLCommons Blackwell Ultra DeepSeek-R1 DeepSeek

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches2 shared topics

GPT4All Hits 77K GitHub Stars, Adds DeepSeek R1 for Free Local AI

Products & Launches2 shared topics

DeepSeek's R1 Model Triggers Major AI Market Valuation Shifts

AI Research2 shared topics

DeepSeek-R1 Reportedly Hits 78.9% on OS-World, Outperforming GPT-5.4 at 1/10th Cost

AI Research2 shared topics

Google Researchers Challenge Singularity Narrative: Intelligence Emerges from Social Systems, Not Individual Minds

Products & Launches2 shared topics

China's DeepSeek-R1: Open-Source AI Agent Runs Locally with Web Search, Code Generation, and Built-In Computer

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Big Tech

View all

Three Chinese AI company logos—Alibaba, ByteDance, Zhipu AI—alongside US and French tech logos arranged on a digital…

Big Tech

Time's First AI A-List: Alibaba, ByteDance, Zhipu AI Make Cut

Time magazine named Alibaba, ByteDance, and Zhipu AI among its first AI-specific top 10 list, alongside six US companies and France's Mistral AI. The recognition highlights China's growing global influence through open-source models and consumer AI apps.

scmp.com/Apr 29, 2026/3 min read

time magazinealibabachina ai

Data center racks with glowing server lights, a digital chart overlay showing parameter growth from 671B to 1.6T…

Big Tech

100

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x

DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

the-decoder.com/Apr 24, 2026/3 min read/Widely Reported

foundation modelsagentic aiopen source ai

A Tencent employee demonstrates the HY3 AI model interface on a large screen at a tech conference, showing Yuanbao…

Big Tech

100

Tencent's HY3 AI Model Has 295B Params, Led by Ex-OpenAI Researcher

Tencent unveiled its HY3 preview model, its most powerful yet with 295 billion parameters. It's already deployed in consumer app Yuanbao and coding assistant CodeBuddy.

scmp.com/Apr 23, 2026/3 min read/Widely Reported

model releaseleadershipbusiness ai

What's New in MLPerf Inference v6.0

Nvidia's Strategy: Scale and Software

Key Software Optimizations

AMD and Intel's Different Battles

Key Numbers: MLPerf Inference v6.0 Highlights

What This Means in Practice

gentic.news Analysis

Frequently Asked Questions

What is MLPerf Inference?

Why can't I directly compare Nvidia's, AMD's, and Intel's MLPerf results?

What are the practical implications of Nvidia's 2.7x software gain?

Who uses MLPerf results?

AI Analysis

✨AI Toolslive

Related Articles

GPT4All Hits 77K GitHub Stars, Adds DeepSeek R1 for Free Local AI

DeepSeek's R1 Model Triggers Major AI Market Valuation Shifts

DeepSeek-R1 Reportedly Hits 78.9% on OS-World, Outperforming GPT-5.4 at 1/10th Cost

Google Researchers Challenge Singularity Narrative: Intelligence Emerges from Social Systems, Not Individual Minds

China's DeepSeek-R1: Open-Source AI Agent Runs Locally with Web Search, Code Generation, and Built-In Computer

The framework underneath this story

More in Big Tech

Time's First AI A-List: Alibaba, ByteDance, Zhipu AI Make Cut

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x

Tencent's HY3 AI Model Has 295B Params, Led by Ex-OpenAI Researcher