What is the Vera Rubin NVL72?

NVIDIA's next-generation AI inference platform, built for agentic AI workloads, offering 10x lower cost-per-token than Blackwell.

Why does cost-per-token matter for agentic AI?

Agentic AI workflows consume 5-10x more tokens per task than single-shot generation, making cost reduction critical for economic viability.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

NVIDIA CEO Jensen Huang onstage at Dell Technologies World, gesturing toward a large screen displaying the Vera…

Big TechScore: 95

NVIDIA Vera Rubin NVL72 Cuts Agentic AI Cost 10x vs Blackwell

NVIDIA Vera Rubin NVL72 cuts agentic AI inference cost 10x vs Blackwell, per Huang at Dell event. 5,000 enterprises already on Dell factories.

AAAla SMITH & AI Research Desk·May 18, 2026·4 min read··140 views·AI-Generated·Report error

Source: blogs.nvidia.comvia nvidia_dc_blog, gn_gpu_clusterWidely Reported

How does NVIDIA Vera Rubin NVL72 reduce cost for agentic AI inference?

NVIDIA's Vera Rubin NVL72, announced at Dell Technologies World, delivers up to 10x lower cost-per-token than Blackwell for agentic AI inference, per CEO Jensen Huang. Dell also introduced PowerEdge XE9812 servers and agent sandboxes 50% faster than CPUs.

TL;DR

NVIDIA Vera Rubin NVL72 delivers 10x lower cost-per-token · 5,000 enterprises run AI on Dell factories · Huang calls demand 'utterly parabolic'

NVIDIA CEO Jensen Huang declared AI demand 'utterly parabolic' at Dell Technologies World. The Vera Rubin NVL72 platform cuts agentic inference cost-per-token 10x versus Blackwell.

Key facts

Vera Rubin NVL72 delivers 10x lower cost-per-token than Blackwell
5,000 enterprises run AI on Dell factories with NVIDIA
Token consumption projected to grow 3,400% by 2030
AI infrastructure spend could reach $3-4 trillion by 2030
Vera CPU offers 1.2 TB/s memory bandwidth

Dell Technologies World opened Monday with a joint keynote from Michael Dell and Jensen Huang, centering on the Dell AI Factory with NVIDIA as the platform for enterprise agentic AI. The headline hardware: the Dell PowerEdge XE9812, built on NVIDIA Vera Rubin NVL72, which delivers up to 10x lower cost-per-token than NVIDIA Blackwell for massive-scale agentic AI inferencing [According to NVIDIA's blog post].

Agent sandboxes run 50% faster on NVIDIA Vera than traditional CPUs, while enterprise data queries are up to 3x faster with the Vera CPU. Huang framed the shift: 'We've now arrived at the era of useful AI, which is the reason why demand is going parabolic, utterly parabolic.'

Key Takeaways

NVIDIA Vera Rubin NVL72 cuts agentic AI inference cost 10x vs Blackwell, per Huang at Dell event.
5,000 enterprises already on Dell factories.

The Vera Rubin Lineup and Enterprise Scale

The PowerEdge XE9812 is joined by three new servers — XE9880L, XE9885L, and XE9882L — the first Dell systems built on NVIDIA HGX Rubin NVL8, supporting up to 144 GPUs per rack with 100% direct liquid-cooled compute nodes and up to 10x the performance of HGX B200. Networking gets the Dell PowerSwitch portfolio with NVIDIA Quantum-X800 InfiniBand and Spectrum-6 Ethernet.

Dell also introduced PowerRack, a fully integrated compute, networking, and storage system engineered as one, aiming to eliminate integration overhead for enterprises. On the CPU side, PowerEdge M9822 and R9822 servers bring NVIDIA Vera CPUs with 1.2 TB/s memory bandwidth, purpose-built for agentic AI workloads.

Michael Dell sized the stakes: worldwide AI infrastructure spending could reach $3-4 trillion by 2030, with token consumption projected to grow 3,400% in the same window. 5,000 enterprises like Lilly, Samsung, and Honeywell are already running AI workloads on Dell AI Factories with NVIDIA.

The Unique Take: Cost Curve Inversion for Agentic Inference

The critical shift here isn't just raw performance — it's the cost-per-token inversion. Prior to Vera Rubin, agentic AI inference was economically prohibitive at scale because each agent loop (plan, tool call, context fetch, response) multiplies token consumption by 5-10x over single-shot generation. Vera Rubin's 10x cost reduction makes the agentic workflow viable where it wasn't before. This aligns with the 3,400% token growth projection: if cost drops 10x, demand expands faster than Moore's Law would predict.

Powering the Next American Century: US Energy Secretary Chris Wright and NVIDIA’s Ian Buck on the Genesis Mission

Huang's 'parabolic' language echoes his earlier dispute with Anthropic CEO Dario Amodei over $1 trillion AI revenue forecasts — Huang called that figure 'too conservative' at GTC 2026. The Vera Rubin NVL72 is the hardware bet that enterprise agentic AI will consume tokens at a rate that justifies $3-4 trillion in infrastructure spend by 2030.

What to watch

Watch for Vera Rubin NVL72 pricing and availability timelines from Dell and NVIDIA in Q3 2026. Also track whether enterprise adoption of agentic AI on Dell factories crosses 10,000 customers, and whether rival platforms like AMD's MI400 or Cerebras WSE-3 respond with comparable cost-per-token claims.

The Next Generation of AI Begins

[Updated 22 May via gn_gpu_cluster]

A single Vera Rubin NVL72 rack is priced at $7.8 million — double the cost of a Blackwell rack — with memory costs soaring 485% to now represent 25% of total system cost, according to Tom's Hardware. Rubin GPUs themselves cost around $50,000 each. The pricing underscores the scale of NVIDIA's next-generation infrastructure bet, even as rivals like Cerebras challenge its inference dominance [per Tom's Hardware].

Sources cited in this article

NVIDIA's
Huang
Tom's Hardware. Rubin GPUs
Tom's Hardware

Source: gentic.news · May 18, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 4 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The Vera Rubin NVL72 announcement represents a structural shift in the economics of agentic AI. Prior to this, the industry faced a cost-per-token wall: agentic loops (plan, tool call, context fetch, response) consume 5-10x more tokens than single-shot generation, making them uneconomical for most enterprise use cases. A 10x cost reduction inverts that equation — agentic workflows become cheaper per task than single-shot generation was on prior hardware. This is the core insight that justifies Huang's 'parabolic' demand claim. Dell's positioning as the systems integrator is also notable. By offering PowerRack as a fully integrated system with thermal, power, and software optimization, Dell is targeting the enterprise pain point that has slowed AI adoption: integration overhead. The 5,000 customer count (Lilly, Samsung, Honeywell) suggests momentum, but the real test will be whether that number accelerates post-Vera Rubin. Huang's rhetoric — 'utterly parabolic,' 'what took months now takes weeks' — should be read as a competitive signal. At GTC 2026, he disputed Amodei's $1 trillion forecast as too conservative. The Vera Rubin NVL72 is the hardware that backs up that bullishness. The question is whether supply can match the demand he's projecting, and whether competitors like AMD, Cerebras, or Google TPU can match the cost-per-token curve.

#ai infrastructure #hardware #agentic ai #nvidia #enterprise ai

Compare side-by-side

Nvidia vs Dell Technologies

→

Mentioned in this article

Nvidia Jensen Huang Vera Rubin NVL72 Dell Technologies Blackwell Dell PowerEdge XE9812 Dell AI Factory Vera CPU Michael Dell

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Policy & Ethics3 shared topics

Nvidia, Unitree, Sharpa unveil H2+ humanoid robot reference design

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

NVIDIA Vera Rubin NVL72 Cuts Agentic AI Cost 10x vs Blackwell

Key Takeaways

The Vera Rubin Lineup and Enterprise Scale

The Unique Take: Cost Curve Inversion for Agentic Inference

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

US Clears Nvidia H200 Sales to 10 China Firms, Reversing Ban

Lansing AI data center petition hits 20,000 signatures

Nvidia Vera Rubin NVL72 Cloud Rollout Hits Europe Ahead of H2 Deployments

NVIDIA Vera Rubin: One Rack Matches TOP500, 35 EU Labs Deploy

UK Doubles Sovereign AI Cloud Providers, Deploys 65MW Nebius Cluster

Nvidia, Unitree, Sharpa unveil H2+ humanoid robot reference design

The framework underneath this story

More in Big Tech

Anthropic Explores Custom AI Chip with Samsung

HP Inc. Partners with OpenAI to Deploy AI Across Enterprise Ops

OpenAI Delays IPO to 2027 as $1T Valuation Target Hits Market Resistance