Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A close-up of an NVIDIA Blackwell GPU with NVLink connectors, paired with a performance chart showing a 61%…
AI ResearchScore: 95

Blackwell NVLink Breaks Confidential Compute, 61% Regression Reported

NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B. Hopper had unencrypted NVLink, compounding the issue.

·8h ago·3 min read··5 views·AI-Generated·Report error
Share:
What performance impact does NVIDIA's Blackwell confidential computing have on NVLink multicast?

NVIDIA's Blackwell confidential computing disables NVLink multicast, causing a 61% performance regression on SGLang Qwen3.5 397B, per @SemiAnalysis_ and @verdacloud's GitHub ticket.

TL;DR

Blackwell NVLink multicast unsupported in confidential mode · 61% performance regression on SGLang Qwen3.5 397B · Hopper confidential compute had unencrypted NVLink

NVIDIA Blackwell's confidential computing disables NVLink multicast, causing a 61% performance regression on SGLang Qwen3.5 397B. The finding, from a GitHub ticket by @verdacloud, was amplified by @SemiAnalysis_.

Key facts

  • 61% performance regression on SGLang Qwen3.5 397B
  • NVLink multicast unsupported in Blackwell confidential computing
  • Hopper confidential computing had unencrypted NVLink
  • Finding from @verdacloud GitHub ticket, amplified by @SemiAnalysis_
  • Regression affects large-model inference in regulated environments

NVIDIA's Blackwell architecture suffers a critical flaw in its confidential computing implementation: NVLink multicast is not supported, leading to a 61% performance regression on SGLang Qwen3.5 397B, according to a GitHub ticket from @verdacloud and reporting by @SemiAnalysis_ [@SemiAnalysis_]. The regression is particularly severe for large-model inference, where NVLink multicast—which allows one GPU to broadcast data to multiple GPUs simultaneously—is essential for reducing communication overhead.

The issue is compounded by NVIDIA's own documentation. The company's whitepaper, "NVIDIA Secure AI with Blackwell and Hopper GPUs," reveals that Hopper's confidential computing had fully unencrypted NVLink, meaning the previous generation's "secure" mode was incomplete [NVIDIA whitepaper]. This suggests NVIDIA's confidential computing story has been inconsistent across generations.

The 61% regression on SGLang Qwen3.5 397B is a worst-case scenario for large-model inference. SGLang, a popular inference engine for large language models, relies heavily on NVLink multicast for tensor parallelism across GPUs. Without multicast, each GPU must individually fetch data from other GPUs, increasing latency and reducing throughput.

Why this matters more than the press release suggests

The NVLink multicast regression reveals a structural trade-off in NVIDIA's confidential computing design. To achieve memory encryption and isolation, NVIDIA must disable NVLink multicast, which is a hardware-level feature. This is not a software bug that can be patched—it is a design choice with permanent performance implications for any workload requiring confidential computing.

For enterprise customers deploying large models in regulated environments (finance, healthcare, government), this is a significant problem. They must choose between security (confidential computing) and performance (NVLink multicast). The 61% regression makes large-model inference under confidential computing nearly impractical for latency-sensitive applications.

Broader context

This is not an isolated incident. Earlier this year, NVIDIA's Grace Hopper superchip faced criticism for memory bandwidth limitations in confidential computing mode. The pattern suggests NVIDIA is prioritizing time-to-market over rigorous validation of security features. Competitors like AMD's MI300X, which supports confidential computing with full interconnect bandwidth, could capitalize on this weakness.

What to watch

Watch for NVIDIA's response—either a firmware update that mitigates the regression (unlikely given the hardware nature) or a revised whitepaper acknowledging the limitation. Also monitor @verdacloud's GitHub ticket for updates and any benchmark comparisons from AMD or Intel showcasing their confidential computing performance on large models.

What to watch

Watch for NVIDIA's official response—either a firmware update (unlikely) or a revised whitepaper acknowledging the limitation. Also monitor @verdacloud's GitHub ticket for updates and benchmark comparisons from AMD or Intel showcasing their confidential computing performance on large models.

Sources cited in this article

  1. GPUs
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The NVLink multicast regression is a clear example of NVIDIA prioritizing security certification over performance. By disabling multicast, Blackwell's confidential computing mode achieves memory encryption but at a cost that makes large-model inference impractical. This is not a bug—it's a design trade-off that NVIDIA chose to accept. Comparing to Hopper, where confidential computing had unencrypted NVLink, the situation is worse: NVIDIA shipped a 'secure' product that was not actually secure (unencrypted NVLink), and then fixed it by crippling performance. This is a lose-lose for customers. AMD's MI300X supports confidential computing with full interconnect bandwidth, making it an attractive alternative for enterprises that need both security and performance. If NVIDIA does not address this quickly, it could lose high-value regulated workloads to AMD.
Compare side-by-side
Nvidia vs SemiAnalysis
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all