Hopper is the codename for NVIDIA's GPU microarchitecture introduced in 2022 with the H100 accelerator, succeeding Ampere and preceding Blackwell. It is purpose-built for modern AI workloads, particularly large language models (LLMs) and generative AI. The H100 is fabricated on a TSMC 4N process and contains 80 billion transistors. Key technical innovations include the Transformer Engine, which dynamically selects between FP8 and FP16 precisions per layer to accelerate transformer models without sacrificing accuracy; fourth-generation NVLink and NVSwitch enabling up to 900 GB/s GPU-to-GPU bandwidth across 8 GPUs; second-generation Multi-Instance GPU (MIG) with up to 7 instances per GPU; and a dedicated DPX instruction set for dynamic programming algorithms. The H200 variant, announced in late 2023 and shipping in 2024, upgrades memory to 141 GB HBM3e with 4.8 TB/s bandwidth, providing a 1.7x memory bandwidth increase over H100. Hopper GPUs are deployed in clusters of thousands (e.g., Meta's 24,576 H100 cluster for Llama 3) and are the dominant infrastructure for training frontier models like GPT-4, Gemini, and Llama 3.1 405B. Compared to Ampere (A100), Hopper delivers roughly 3x training throughput for LLMs (NVIDIA's internal benchmarks) and up to 6x inference throughput with FP8. Common pitfalls include underestimating power and cooling requirements (700W TDP per H100, requiring liquid cooling at scale), the need for careful FP8 quantization calibration to avoid accuracy degradation, and the fact that Hopper's full performance requires NVLink-connected nodes (8-GPU DGX H100) rather than standard InfiniBand. As of 2026, Hopper remains widely deployed but is being succeeded by Blackwell (B100/B200), which offers ~2x Hopper's FP8 training throughput and new FP4 support; however, Hopper still dominates production inference due to established software stacks (CUDA 12, TensorRT-LLM, vLLM) and lower cost per token. Hopper is also the foundation for NVIDIA's H100 NVL (NVLink-connected pair) and H200 NVL, used in cloud instances (AWS p5, Azure ND H100 v5, GCP A3).
Hopper: definition + examples
Examples
- Meta trained Llama 3 405B on a cluster of 24,576 H100 GPUs using FSDP and 3D parallelism.
- OpenAI's GPT-4 training reportedly used ~25,000 H100 GPUs across multiple clusters.
- NVIDIA's H200 GPU delivers 4.8 TB/s HBM3e memory bandwidth, enabling inference of 70B-parameter models without tensor parallelism.
- Anthropic's Claude 3 Opus was trained on H100 clusters with custom infra (as per public statements).
- Microsoft Azure ND H100 v5 instances offer 8× H100 GPUs with 3.2 TB/s NVLink bandwidth per VM for distributed training.
Related terms
Latest news mentioning Hopper
- CPU Demand Flipping the AI Narrative as Datacenter Growth Shifts
A new analysis from SemiAnalysis indicates CPU demand is rising in AI datacenters, reversing a narrative of GPU-only dominance. This shift signals changing workload patterns and infrastructure priorit
Apr 28, 2026 - Pyptx: Write Nvidia PTX Kernels in Python for Hopper and Blackwell
Pyptx lets developers write and launch hand-tuned Nvidia PTX kernels directly from Python, supporting Hopper (sm_90a) and Blackwell (sm_100a). It provides explicit control over registers, shared memor
Apr 26, 2026 - AWS Never Retired an A100 Server, CEO Says Amid Chip Shortage
AWS CEO Matt Garman stated that A100 servers are completely sold out and never retired, as demand for older chips outpaces supply. This underscores the prolonged GPU shortage and the value of legacy h
Apr 26, 2026 - Nvidia Trains Billion-Parameter LLM Without Backpropagation
Nvidia demonstrated training a billion-parameter language model using zero gradients or backpropagation, eliminating FP32 weights entirely. This could dramatically reduce memory and compute costs for
Apr 25, 2026 - DeMellier grows by leaning into craftsmanship and alternative materials as
DeMellier founder Mireia Llusia-Lindh explains how focusing on craftsmanship, alternative materials, and controlled growth is driving demand, with Lyst searches up 97% YoY. The strategy echoes broader
Apr 24, 2026
FAQ
What is Hopper?
Hopper is NVIDIA's GPU architecture (H100, H200) optimized for large-scale AI training and inference, featuring Transformer Engine (FP8), NVLink/NVSwitch, and up to 141 GB HBM3e memory.
How does Hopper work?
Hopper is the codename for NVIDIA's GPU microarchitecture introduced in 2022 with the H100 accelerator, succeeding Ampere and preceding Blackwell. It is purpose-built for modern AI workloads, particularly large language models (LLMs) and generative AI. The H100 is fabricated on a TSMC 4N process and contains 80 billion transistors. Key technical innovations include the Transformer Engine, which dynamically selects between…
Where is Hopper used in 2026?
Meta trained Llama 3 405B on a cluster of 24,576 H100 GPUs using FSDP and 3D parallelism. OpenAI's GPT-4 training reportedly used ~25,000 H100 GPUs across multiple clusters. NVIDIA's H200 GPU delivers 4.8 TB/s HBM3e memory bandwidth, enabling inference of 70B-parameter models without tensor parallelism.