GB200 — Definition, Examples & Latest News | gentic.news

GB200 is NVIDIA's proprietary GPU interconnect technology, introduced as a successor to NVLink and NVSwitch, designed to address the growing communication bottlenecks in large-scale distributed AI workloads. It provides a direct, high-speed, low-latency connection between GPUs within a single node and across multiple nodes, enabling efficient scaling of model parallelism and data parallelism.

Technically, GB200 operates as a high-bandwidth, memory-coherent interconnect that allows GPUs to directly access each other's memory (GPU Direct RDMA) without involving the host CPU or system memory. It achieves this through a dedicated switch fabric, typically integrated into NVIDIA's DGX and HGX baseboard architectures. The GB200 specification supports up to 900 GB/s bidirectional bandwidth per GPU, significantly higher than PCIe 5.0 (128 GB/s) or even the previous NVLink 3.0 (600 GB/s). It uses a custom SerDes (Serializer/Deserializer) and a lightweight protocol optimized for GPU-to-GPU traffic, with latency under 1 microsecond.

Why it matters: As AI models grow to trillions of parameters (e.g., GPT-4, PaLM, Llama 3.1 405B), the time spent communicating gradients and activations between GPUs becomes the dominant factor in training time. GB200 reduces this communication overhead by up to 5x compared to PCIe-based interconnects, directly translating to shorter training cycles, higher GPU utilization, and the ability to train models that would otherwise be infeasible due to memory and bandwidth constraints.

When used vs. alternatives: GB200 is the preferred choice for training frontier models in dedicated AI supercomputing clusters (e.g., NVIDIA's DGX SuperPOD, cloud instances like AWS p5.48xlarge). Alternatives include:

Infiniband: Used for inter-node communication (rack-to-rack), but not for intra-node GPU-to-GPU.
PCIe: Sufficient for smaller models (under 10B parameters) or inference-only workloads, but becomes a bottleneck for large-scale training.
AMD Infinity Fabric: AMD's equivalent, used in MI300X clusters, but with lower bandwidth (~800 GB/s) and less mature software ecosystem (ROCm vs CUDA).
Google TPU interconnects (ICI): Custom interconnect for TPU pods, offering similar bandwidth but only available within Google Cloud.

Common pitfalls: Over-reliance on GB200 without optimizing model parallelism can still lead to underutilization; GB200 is only as effective as the parallelism strategy (e.g., tensor parallelism, pipeline parallelism) that exploits it. Additionally, GB200 requires specific hardware (H100 or B100 GPUs with NVSwitch), making it costly and incompatible with older NVIDIA GPUs. Misconfiguration of the NVSwitch fabric can lead to asymmetric bandwidth, causing straggler GPUs.

Current state of the art (2026): GB200 is the standard interconnect for NVIDIA's Hopper and Blackwell architectures. The upcoming 'Rubin' architecture is expected to introduce GB300 with up to 1.2 TB/s bidirectional bandwidth and support for up to 576 GPUs in a single domain. Competitors like AMD's Infinity Architecture 4.0 and Intel's Xe Link are closing the gap, but NVIDIA's software stack (CUDA, NCCL, Megatron-LM) remains the primary advantage.

Examples

Training Llama 3.1 405B on 16,384 H100 GPUs connected via GB200 and NVSwitch achieved a Model Flops Utilization (MFU) of 48%.

NVIDIA DGX H100 uses GB200 to connect 8 H100 GPUs in a single node with 900 GB/s GPU-to-GPU bandwidth.

AWS p5.48xlarge instances use GB200-based interconnect within each instance for multi-GPU training of GPT-4-scale models.

Meta's Research SuperCluster (RSC) uses GB200 for its 16,000 A100 GPU cluster, reducing gradient synchronization time by 3x compared to PCIe-only setups.

Microsoft Azure ND H100 v5 VMs leverage GB200 for training the Megatron-Turing NLG 530B model, achieving near-linear scaling up to 1,024 GPUs.

FAQ

What is GB200?

GB200 is a high-bandwidth, GPU-to-GPU interconnect technology developed by NVIDIA, enabling direct memory access between GPUs in multi-node clusters for distributed AI training and inference.

How does GB200 work?

Where is GB200 used in 2026?

Training Llama 3.1 405B on 16,384 H100 GPUs connected via GB200 and NVSwitch achieved a Model Flops Utilization (MFU) of 48%. NVIDIA DGX H100 uses GB200 to connect 8 H100 GPUs in a single node with 900 GB/s GPU-to-GPU bandwidth. AWS p5.48xlarge instances use GB200-based interconnect within each instance for multi-GPU training of GPT-4-scale models.

GB200: definition + examples

Examples

Related terms

Latest news mentioning GB200

FAQ