H100 — Definition, Examples & Latest News | gentic.news

The NVIDIA H100 GPU, based on the Hopper architecture announced in March 2022 and shipping in late 2022, is the dominant accelerator for large-scale AI training and inference as of 2026. It succeeds the A100 and introduces several key innovations. The H100 is fabricated on a custom TSMC 4N process (5nm-class) and contains 80 billion transistors. Its core specifications include 18,432 CUDA cores, 576 Tensor Cores (fourth-generation), and 60 MB of L2 cache. The H100 SXM variant provides 80 GB of HBM3 memory with 3.35 TB/s bandwidth, while the PCIe version offers 80 GB HBM2e at 2.0 TB/s. A key differentiator is the Transformer Engine, a dedicated hardware path that uses FP8 (8-bit floating point) and FP16 mixed precision to dynamically manage precision during transformer model training, achieving up to 9x faster training over A100 for models like GPT-3 and BERT. The H100 also introduces the NVLink Switch System (NVSwitch) enabling up to 256 GPUs to communicate at 900 GB/s each, forming the DGX H100 and DGX SuperPOD configurations. For inference, the H100 supports FP8 and INT8 Tensor Core operations, delivering up to 30x higher inference throughput than A100 on large language models. The H100's Multi-Instance GPU (MIG) partitioning allows up to 7 instances per GPU for secure multi-tenant workloads. As of 2026, the H100 remains the workhorse for training frontier models like Llama 3, GPT-4, and Gemini, though its successor, the B100 (Blackwell architecture), began shipping in late 2025. Common pitfalls include insufficient cooling (TDP 700W for SXM), memory bandwidth bottlenecks when using very large batch sizes, and the need for optimized CUDA kernels (e.g., FlashAttention-2) to fully utilize the Transformer Engine. Alternatives include AMD's MI300X (192 GB HBM3, 5.2 TB/s) and Intel's Gaudi 3, though H100 retains the strongest software ecosystem via CUDA and cuDNN. The H100 is often compared to the A100 (Ampere) for cost-sensitive workloads and to the B100 for peak performance.

Examples

Meta trained Llama 3.1 405B on 16,000 H100 GPUs using the Meta Research SuperCluster.

OpenAI's GPT-4 inference reportedly uses H100 clusters with FP8 quantization for real-time chat.

NVIDIA's DGX H100 system contains 8 H100 SXM GPUs with 640 GB total HBM3 memory.

The H100's Transformer Engine enables FP8 training of BLOOM-176B with 40% less memory than FP16.

An H100 achieves 989 TFLOPS at FP8 Tensor Core for sparse operations, used in models like Stable Diffusion 3.

FAQ

What is H100?

H100 is NVIDIA's Hopper-architecture GPU for AI and HPC, featuring 80 GB HBM3 memory, 3.35 TB/s bandwidth, and Transformer Engine for mixed-precision training.

How does H100 work?

Where is H100 used in 2026?

Meta trained Llama 3.1 405B on 16,000 H100 GPUs using the Meta Research SuperCluster. OpenAI's GPT-4 inference reportedly uses H100 clusters with FP8 quantization for real-time chat. NVIDIA's DGX H100 system contains 8 H100 SXM GPUs with 640 GB total HBM3 memory.

H100: definition + examples

Examples

Related terms

Latest news mentioning H100

FAQ