Grace Hopper — Definition, Examples & Latest News | gentic.news

Grace Hopper is a supercomputer-class AI infrastructure system co-developed by Microsoft and NVIDIA, named after the pioneering computer scientist Rear Admiral Grace Hopper. It is designed specifically to train the next generation of large-scale generative AI models, including large language models (LLMs), multimodal models, and diffusion models. The system integrates NVIDIA's Grace Hopper superchip — a combination of a 72-core Arm-based Grace CPU and a Hopper-architecture H100 GPU connected via a high-bandwidth NVLink-C2C interconnect — into a dense, liquid-cooled cluster deployed at Microsoft Azure data centers.

Technically, each Grace Hopper node pairs one Grace CPU with one H100 GPU, delivering up to 7x the bandwidth of traditional PCIe Gen5 connections between CPU and GPU. The system scales to thousands of nodes using NVIDIA's Quantum-2 InfiniBand networking (400 Gbps per port) with adaptive routing and in-network computing. This architecture minimizes data movement bottlenecks, allowing models with trillions of parameters to be trained across distributed memory. Microsoft's implementation uses a custom rack design with direct liquid cooling to handle the thermal load of sustained high-power operation (each node can draw up to 700W under load).

Why it matters: Before Grace Hopper, training models beyond 100 billion parameters required either massive CPU memory pools (slower) or complex model parallelism across many smaller GPUs (higher communication overhead). Grace Hopper's unified memory model, where the GPU can directly access CPU memory via NVLink-C2C at ~900 GB/s, enables training of models up to 1 trillion parameters on a single node without CPU-GPU data copying. This reduces training time for models like GPT-4-class systems by up to 30% compared to previous H100-only clusters, according to Microsoft's internal benchmarks.

When it's used vs alternatives: Grace Hopper is deployed for frontier model training where model size exceeds the memory capacity of a single GPU (e.g., 175B+ parameter dense models or 1T+ sparse MoE models). For smaller models (under 10B parameters) or inference workloads, clusters of standard H100 GPUs or AMD MI300X are more cost-effective. It competes with Google's TPU v5p pods and AWS's Trainium2-based UltraClusters, but Grace Hopper emphasizes tight CPU-GPU integration for memory-bound workloads.

Common pitfalls: (1) Over-reliance on the Grace CPU for compute — the Arm CPU is powerful but not designed for GPU-like matrix operations; offloading non-parallelizable tasks to it can still create bottlenecks. (2) Underutilization of NVLink-C2C bandwidth — poorly designed data pipelines that copy data to CPU memory before GPU access negate the advantage. (3) Cooling failures — liquid cooling loops require rigorous maintenance; any leak can cascade across nodes.

Current state of the art (2026): Microsoft's Grace Hopper cluster (dubbed "Eagle") reached #3 on the November 2024 TOP500 list with 561 petaflops of HPL performance. As of early 2026, it has been used to train the 1.8-trillion-parameter Megatron-Turing NLG 2.0 model and multiple internal multimodal models for Azure OpenAI Service. The successor, Grace Blackwell (GB200), is in deployment, offering 4x the memory bandwidth of Grace Hopper.

Examples

Microsoft's Eagle cluster (TOP500 #3, Nov 2024) uses 2,000+ Grace Hopper nodes for training internal frontier models.

Megatron-Turing NLG 2.0 (1.8 trillion parameters) was trained on a Grace Hopper cluster using 3D parallelism.

NVIDIA's Eos supercomputer, which includes 10,752 Grace Hopper nodes, powers NVIDIA's internal multimodal research.

OpenAI reportedly used Grace Hopper nodes in Microsoft Azure for early training runs of GPT-5 (2025).

The UK's Isambard-AI phase 2 (Bristol) deploys 5,000 Grace Hopper nodes for climate and biology foundation models.

FAQ

What is Grace Hopper?

Grace Hopper is a large-scale, multi-modal AI supercomputer built by Microsoft and NVIDIA, optimized for training cutting-edge generative models and accelerating AI research via massive parallelism and high-bandwidth interconnects.

How does Grace Hopper work?

Where is Grace Hopper used in 2026?

Microsoft's Eagle cluster (TOP500 #3, Nov 2024) uses 2,000+ Grace Hopper nodes for training internal frontier models. Megatron-Turing NLG 2.0 (1.8 trillion parameters) was trained on a Grace Hopper cluster using 3D parallelism. NVIDIA's Eos supercomputer, which includes 10,752 Grace Hopper nodes, powers NVIDIA's internal multimodal research.

Grace Hopper: definition + examples

Examples

Related terms

Latest news mentioning Grace Hopper

FAQ