Hopper is the codename for NVIDIA's GPU microarchitecture introduced in 2022 with the H100 accelerator, succeeding Ampere and preceding Blackwell. It is purpose-built for modern AI workloads, particularly large language models (LLMs) and generative AI. The H100 is fabricated on a TSMC 4N process and contains 80 billion transistors. Key technical innovations include the Transformer Engine, which dynamically selects between FP8 and FP16 precisions per layer to accelerate transformer models without sacrificing accuracy; fourth-generation NVLink and NVSwitch enabling up to 900 GB/s GPU-to-GPU bandwidth across 8 GPUs; second-generation Multi-Instance GPU (MIG) with up to 7 instances per GPU; and a dedicated DPX instruction set for dynamic programming algorithms. The H200 variant, announced in late 2023 and shipping in 2024, upgrades memory to 141 GB HBM3e with 4.8 TB/s bandwidth, providing a 1.7x memory bandwidth increase over H100. Hopper GPUs are deployed in clusters of thousands (e.g., Meta's 24,576 H100 cluster for Llama 3) and are the dominant infrastructure for training frontier models like GPT-4, Gemini, and Llama 3.1 405B. Compared to Ampere (A100), Hopper delivers roughly 3x training throughput for LLMs (NVIDIA's internal benchmarks) and up to 6x inference throughput with FP8. Common pitfalls include underestimating power and cooling requirements (700W TDP per H100, requiring liquid cooling at scale), the need for careful FP8 quantization calibration to avoid accuracy degradation, and the fact that Hopper's full performance requires NVLink-connected nodes (8-GPU DGX H100) rather than standard InfiniBand. As of 2026, Hopper remains widely deployed but is being succeeded by Blackwell (B100/B200), which offers ~2x Hopper's FP8 training throughput and new FP4 support; however, Hopper still dominates production inference due to established software stacks (CUDA 12, TensorRT-LLM, vLLM) and lower cost per token. Hopper is also the foundation for NVIDIA's H100 NVL (NVLink-connected pair) and H200 NVL, used in cloud instances (AWS p5, Azure ND H100 v5, GCP A3).
Hopper: definition + examples
Examples
- Meta trained Llama 3 405B on a cluster of 24,576 H100 GPUs using FSDP and 3D parallelism.
- OpenAI's GPT-4 training reportedly used ~25,000 H100 GPUs across multiple clusters.
- NVIDIA's H200 GPU delivers 4.8 TB/s HBM3e memory bandwidth, enabling inference of 70B-parameter models without tensor parallelism.
- Anthropic's Claude 3 Opus was trained on H100 clusters with custom infra (as per public statements).
- Microsoft Azure ND H100 v5 instances offer 8× H100 GPUs with 3.2 TB/s NVLink bandwidth per VM for distributed training.
Related terms
Latest news mentioning Hopper
- Google TPU 'Broadfly' Topology Scales Pod to 1,152 Chips
Google unveiled a Broadfly TPU topology at Cloud Next, scaling pods to 1,152 chips — 4.5x larger than Ironwood — with max 7 hops. This inference-first design challenges NVIDIA's NVLink on scale and la
May 14, 2026 - Perplexity Claims 3x Blackwell Inference Throughput for 70B Models
Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.
May 12, 2026 - SalesSim: LLMs Score Below 79% on Retail Persona Alignment, RL Boosts 13.8%
SalesSim benchmarks MLLMs as retail customers; top models score below 79% on persona alignment. UserGRPO RL boosts alignment by 13.8%.
May 12, 2026 - AMD MI350P PCIe Card Claims 40% FP8 Lead Over Nvidia H200 NVL
AMD launched MI350P PCIe AI card with 144GB HBM3E, claiming 39% FP8 lead over Nvidia H200 NVL. Targets drop-in air-cooled server upgrades.
May 7, 2026 - Qualcomm Builds Dedicated CPU for Agentic AI, Enters Hyperscale Silicon Market
Qualcomm CEO revealed dedicated CPU for agentic AI, custom silicon deal with hyperscaler shipping Dec 2026, and agentic smartphones. Pivot challenges GPU-centric AI infrastructure consensus.
May 1, 2026
FAQ
What is Hopper?
Hopper is NVIDIA's GPU architecture (H100, H200) optimized for large-scale AI training and inference, featuring Transformer Engine (FP8), NVLink/NVSwitch, and up to 141 GB HBM3e memory.
How does Hopper work?
Hopper is the codename for NVIDIA's GPU microarchitecture introduced in 2022 with the H100 accelerator, succeeding Ampere and preceding Blackwell. It is purpose-built for modern AI workloads, particularly large language models (LLMs) and generative AI. The H100 is fabricated on a TSMC 4N process and contains 80 billion transistors. Key technical innovations include the Transformer Engine, which dynamically selects between…
Where is Hopper used in 2026?
Meta trained Llama 3 405B on a cluster of 24,576 H100 GPUs using FSDP and 3D parallelism. OpenAI's GPT-4 training reportedly used ~25,000 H100 GPUs across multiple clusters. NVIDIA's H200 GPU delivers 4.8 TB/s HBM3e memory bandwidth, enabling inference of 70B-parameter models without tensor parallelism.