NVIDIA Open-Sources MRC, the RDMA Protocol Powering OpenAI's Blackwell Clusters

NVIDIA open-sourced MRC, a multi-path RDMA protocol used by OpenAI on Blackwell clusters, enabling microsecond rerouting across 64 paths.

AAAla AYADI & AI Research Desk·2h ago·3 min read··7 views·AI-Generated·Report error

Source: x.comvia @kimmonismusCorroborated

What transport protocol did NVIDIA open-source that powers OpenAI's Blackwell clusters?

NVIDIA open-sourced MRC, a multi-path RDMA transport protocol used in OpenAI's Blackwell clusters, enabling hardware-level traffic rerouting within microseconds across up to 64 paths per connection to eliminate network bottlenecks in frontier AI training.

TL;DR

NVIDIA open-sourced MRC, a new RDMA protocol. · It powers OpenAI's Blackwell clusters. · Spreads connections across multiple network paths.

NVIDIA open-sourced MRC, the RDMA transport protocol powering OpenAI's Blackwell clusters. The protocol spreads connections across 64 network paths, rerouting traffic in under a microsecond when paths fail.

Key facts

MRC spreads connections across up to 64 network paths.
Traffic reroutes in hardware within microseconds.
OpenAI uses MRC on Blackwell clusters.
Microsoft and Oracle are named as major deployments.
NVIDIA opened MRC through the Open Compute Project.

NVIDIA open-sourced MRC, a multi-path RDMA transport protocol designed for massive AI training clusters, according to a post by @kimmonismus. The protocol, already deployed in OpenAI's Blackwell clusters, spreads a single connection across multiple network paths—up to 64, per NVIDIA's documentation—enabling hardware-level traffic rerouting within microseconds when a path fails or becomes congested.

The Network Bottleneck

This matters because frontier training is no longer only about GPUs. The network is becoming one of the biggest bottlenecks in AI factories. As cluster sizes grow to tens of thousands of GPUs, single-path RDMA (Remote Direct Memory Access) links create fragile chokepoints. MRC's multi-path design mirrors techniques used in data-center TCP (e.g., MPTCP) but implements them in hardware at the transport layer, avoiding the latency overhead of software-based retransmission.

Deployment and Adoption

OpenAI is already using MRC on Blackwell clusters. Microsoft and Oracle are also named by NVIDIA as major deployments [per @kimmonismus]. The protocol is optimized for NVIDIA's Spectrum-X Ethernet platform, but by opening it through the Open Compute Project (OCP), NVIDIA is pushing Ethernet into territory historically associated with InfiniBand—NVIDIA's own higher-performance interconnect.

The Strategic Play

This is a classic NVIDIA platform move: more open standard on the surface, stronger full-stack NVIDIA advantage underneath. By open-sourcing MRC through OCP, NVIDIA gets the credibility of an open standard while ensuring that the protocol is first optimized for its own Spectrum-X hardware. Competitors like Broadcom and Intel, who also target AI Ethernet fabrics, will need to implement MRC-compatible endpoints or risk losing compatibility with the largest AI clusters. The unique take: this is not just a protocol release—it's a moat-builder disguised as an open-source contribution, locking in the networking layer of AI factories just as CUDA locked in the compute layer.

What to watch

NVIDIA Dynamo, A Low-Latency Distributed Inference Framework ...

Watch for Broadcom and Intel to announce MRC-compatible Ethernet endpoints in the next 6–9 months. Also track whether OCP ratifies MRC as a standard—if it does, NVIDIA's networking moat deepens; if not, the protocol remains a de facto standard limited to NVIDIA's ecosystem.

Sources cited in this article

NVIDIA's

Source: gentic.news · 2h ago · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The MRC open-sourcing is a strategic move that mirrors NVIDIA's CUDA playbook: create a proprietary advantage, then open-source it to set the standard while keeping the deepest optimization locked to your own hardware. The protocol's multi-path design is not novel in concept—MPTCP and similar techniques exist—but implementing it at the RDMA transport layer in hardware is a significant engineering achievement that lowers tail latency in lossy Ethernet fabrics. The key question is whether MRC becomes a true open standard or a de facto NVIDIA standard. Competitors like Broadcom and Intel have their own Ethernet NICs and switches targeting AI workloads (e.g., Broadcom's Jericho3-AI). They will need to either adopt MRC—which ties them to NVIDIA's optimizations—or develop competing multi-path RDMA protocols and fight for ecosystem adoption. Given that OpenAI, Microsoft, and Oracle are already deploying MRC, the latter path is uphill. The timing is notable: NVIDIA is pushing Ethernet into InfiniBand territory just as InfiniBand's own advantages (lower latency, higher reliability) are being eroded by Ethernet advancements. MRC directly addresses Ethernet's historical weakness in handling congestion at scale—the very problem that made InfiniBand the default for the largest clusters. If MRC works as advertised, it could accelerate the shift from InfiniBand to Ethernet for AI training, a shift NVIDIA is uniquely positioned to monetize through its Spectrum-X portfolio.

#open source #ai infrastructure #networking #nvidia #data center

Compare side-by-side

OpenAI vs Nvidia

→

Mentioned in this article

Nvidia OpenAI MRC Blackwell Microsoft Oracle

Enjoyed this article?