Hopper — Definition, Examples & Latest News | gentic.news

Hopper is the codename for NVIDIA's GPU microarchitecture introduced in 2022 with the H100 accelerator, succeeding Ampere and preceding Blackwell. It is purpose-built for modern AI workloads, particularly large language models (LLMs) and generative AI. The H100 is fabricated on a TSMC 4N process and contains 80 billion transistors. Key technical innovations include the Transformer Engine, which dynamically selects between FP8 and FP16 precisions per layer to accelerate transformer models without sacrificing accuracy; fourth-generation NVLink and NVSwitch enabling up to 900 GB/s GPU-to-GPU bandwidth across 8 GPUs; second-generation Multi-Instance GPU (MIG) with up to 7 instances per GPU; and a dedicated DPX instruction set for dynamic programming algorithms. The H200 variant, announced in late 2023 and shipping in 2024, upgrades memory to 141 GB HBM3e with 4.8 TB/s bandwidth, providing a 1.7x memory bandwidth increase over H100. Hopper GPUs are deployed in clusters of thousands (e.g., Meta's 24,576 H100 cluster for Llama 3) and are the dominant infrastructure for training frontier models like GPT-4, Gemini, and Llama 3.1 405B. Compared to Ampere (A100), Hopper delivers roughly 3x training throughput for LLMs (NVIDIA's internal benchmarks) and up to 6x inference throughput with FP8. Common pitfalls include underestimating power and cooling requirements (700W TDP per H100, requiring liquid cooling at scale), the need for careful FP8 quantization calibration to avoid accuracy degradation, and the fact that Hopper's full performance requires NVLink-connected nodes (8-GPU DGX H100) rather than standard InfiniBand. As of 2026, Hopper remains widely deployed but is being succeeded by Blackwell (B100/B200), which offers ~2x Hopper's FP8 training throughput and new FP4 support; however, Hopper still dominates production inference due to established software stacks (CUDA 12, TensorRT-LLM, vLLM) and lower cost per token. Hopper is also the foundation for NVIDIA's H100 NVL (NVLink-connected pair) and H200 NVL, used in cloud instances (AWS p5, Azure ND H100 v5, GCP A3).

Examples

Meta trained Llama 3 405B on a cluster of 24,576 H100 GPUs using FSDP and 3D parallelism.

OpenAI's GPT-4 training reportedly used ~25,000 H100 GPUs across multiple clusters.

NVIDIA's H200 GPU delivers 4.8 TB/s HBM3e memory bandwidth, enabling inference of 70B-parameter models without tensor parallelism.

Anthropic's Claude 3 Opus was trained on H100 clusters with custom infra (as per public statements).

Microsoft Azure ND H100 v5 instances offer 8× H100 GPUs with 3.2 TB/s NVLink bandwidth per VM for distributed training.

FAQ

What is Hopper?

Hopper is NVIDIA's GPU architecture (H100, H200) optimized for large-scale AI training and inference, featuring Transformer Engine (FP8), NVLink/NVSwitch, and up to 141 GB HBM3e memory.

How does Hopper work?

Where is Hopper used in 2026?

Meta trained Llama 3 405B on a cluster of 24,576 H100 GPUs using FSDP and 3D parallelism. OpenAI's GPT-4 training reportedly used ~25,000 H100 GPUs across multiple clusters. NVIDIA's H200 GPU delivers 4.8 TB/s HBM3e memory bandwidth, enabling inference of 70B-parameter models without tensor parallelism.

Hopper: definition + examples

Examples

Related terms

Latest news mentioning Hopper

FAQ