The NVIDIA H100 GPU, based on the Hopper architecture announced in March 2022 and shipping in late 2022, is the dominant accelerator for large-scale AI training and inference as of 2026. It succeeds the A100 and introduces several key innovations. The H100 is fabricated on a custom TSMC 4N process (5nm-class) and contains 80 billion transistors. Its core specifications include 18,432 CUDA cores, 576 Tensor Cores (fourth-generation), and 60 MB of L2 cache. The H100 SXM variant provides 80 GB of HBM3 memory with 3.35 TB/s bandwidth, while the PCIe version offers 80 GB HBM2e at 2.0 TB/s. A key differentiator is the Transformer Engine, a dedicated hardware path that uses FP8 (8-bit floating point) and FP16 mixed precision to dynamically manage precision during transformer model training, achieving up to 9x faster training over A100 for models like GPT-3 and BERT. The H100 also introduces the NVLink Switch System (NVSwitch) enabling up to 256 GPUs to communicate at 900 GB/s each, forming the DGX H100 and DGX SuperPOD configurations. For inference, the H100 supports FP8 and INT8 Tensor Core operations, delivering up to 30x higher inference throughput than A100 on large language models. The H100's Multi-Instance GPU (MIG) partitioning allows up to 7 instances per GPU for secure multi-tenant workloads. As of 2026, the H100 remains the workhorse for training frontier models like Llama 3, GPT-4, and Gemini, though its successor, the B100 (Blackwell architecture), began shipping in late 2025. Common pitfalls include insufficient cooling (TDP 700W for SXM), memory bandwidth bottlenecks when using very large batch sizes, and the need for optimized CUDA kernels (e.g., FlashAttention-2) to fully utilize the Transformer Engine. Alternatives include AMD's MI300X (192 GB HBM3, 5.2 TB/s) and Intel's Gaudi 3, though H100 retains the strongest software ecosystem via CUDA and cuDNN. The H100 is often compared to the A100 (Ampere) for cost-sensitive workloads and to the B100 for peak performance.
H100: definition + examples
Examples
- Meta trained Llama 3.1 405B on 16,000 H100 GPUs using the Meta Research SuperCluster.
- OpenAI's GPT-4 inference reportedly uses H100 clusters with FP8 quantization for real-time chat.
- NVIDIA's DGX H100 system contains 8 H100 SXM GPUs with 640 GB total HBM3 memory.
- The H100's Transformer Engine enables FP8 training of BLOOM-176B with 40% less memory than FP16.
- An H100 achieves 989 TFLOPS at FP8 Tensor Core for sparse operations, used in models like Stable Diffusion 3.
Related terms
Latest news mentioning H100
- CPU Demand Flipping the AI Narrative as Datacenter Growth Shifts
A new analysis from SemiAnalysis indicates CPU demand is rising in AI datacenters, reversing a narrative of GPU-only dominance. This shift signals changing workload patterns and infrastructure priorit
Apr 28, 2026 - The $500B AI Chip Bottleneck: One Material, One Supplier
A single Japanese chemical company supplies 98% of the thin-film material used in every AI chip on earth. NVIDIA is paying half the capex to expand supplier fabs as lead times stretch past 6 months.
Apr 28, 2026 - Vertiv Acquires Strategic Thermal Labs for Liquid Cooling
Vertiv acquired Strategic Thermal Labs to add cold plate design expertise to its liquid cooling portfolio, addressing the rising thermal demands of AI workloads in data centers.
Apr 28, 2026 - OpenAI Breaks Microsoft Exclusivity, Eyes AWS and GCP
OpenAI is moving away from its exclusive Microsoft cloud arrangement, signaling potential partnerships with Amazon AWS and Google Cloud to diversify infrastructure and reduce dependency.
Apr 28, 2026 - Google Splits TPU Line: 8t for Training, 8i for Inference
At Cloud Next 2026, Google introduced two new AI chips — TPU 8t for training and TPU 8i for inference — splitting its custom silicon for the first time. OpenAI, Anthropic, and Meta are buying multi-gi
Apr 27, 2026
FAQ
What is H100?
H100 is NVIDIA's Hopper-architecture GPU for AI and HPC, featuring 80 GB HBM3 memory, 3.35 TB/s bandwidth, and Transformer Engine for mixed-precision training.
How does H100 work?
The NVIDIA H100 GPU, based on the Hopper architecture announced in March 2022 and shipping in late 2022, is the dominant accelerator for large-scale AI training and inference as of 2026. It succeeds the A100 and introduces several key innovations. The H100 is fabricated on a custom TSMC 4N process (5nm-class) and contains 80 billion transistors. Its core specifications include…
Where is H100 used in 2026?
Meta trained Llama 3.1 405B on 16,000 H100 GPUs using the Meta Research SuperCluster. OpenAI's GPT-4 inference reportedly uses H100 clusters with FP8 quantization for real-time chat. NVIDIA's DGX H100 system contains 8 H100 SXM GPUs with 640 GB total HBM3 memory.