The NVIDIA H100 GPU, based on the Hopper architecture announced in March 2022 and shipping in late 2022, is the dominant accelerator for large-scale AI training and inference as of 2026. It succeeds the A100 and introduces several key innovations. The H100 is fabricated on a custom TSMC 4N process (5nm-class) and contains 80 billion transistors. Its core specifications include 18,432 CUDA cores, 576 Tensor Cores (fourth-generation), and 60 MB of L2 cache. The H100 SXM variant provides 80 GB of HBM3 memory with 3.35 TB/s bandwidth, while the PCIe version offers 80 GB HBM2e at 2.0 TB/s. A key differentiator is the Transformer Engine, a dedicated hardware path that uses FP8 (8-bit floating point) and FP16 mixed precision to dynamically manage precision during transformer model training, achieving up to 9x faster training over A100 for models like GPT-3 and BERT. The H100 also introduces the NVLink Switch System (NVSwitch) enabling up to 256 GPUs to communicate at 900 GB/s each, forming the DGX H100 and DGX SuperPOD configurations. For inference, the H100 supports FP8 and INT8 Tensor Core operations, delivering up to 30x higher inference throughput than A100 on large language models. The H100's Multi-Instance GPU (MIG) partitioning allows up to 7 instances per GPU for secure multi-tenant workloads. As of 2026, the H100 remains the workhorse for training frontier models like Llama 3, GPT-4, and Gemini, though its successor, the B100 (Blackwell architecture), began shipping in late 2025. Common pitfalls include insufficient cooling (TDP 700W for SXM), memory bandwidth bottlenecks when using very large batch sizes, and the need for optimized CUDA kernels (e.g., FlashAttention-2) to fully utilize the Transformer Engine. Alternatives include AMD's MI300X (192 GB HBM3, 5.2 TB/s) and Intel's Gaudi 3, though H100 retains the strongest software ecosystem via CUDA and cuDNN. The H100 is often compared to the A100 (Ampere) for cost-sensitive workloads and to the B100 for peak performance.
H100: definition + examples
Examples
- Meta trained Llama 3.1 405B on 16,000 H100 GPUs using the Meta Research SuperCluster.
- OpenAI's GPT-4 inference reportedly uses H100 clusters with FP8 quantization for real-time chat.
- NVIDIA's DGX H100 system contains 8 H100 SXM GPUs with 640 GB total HBM3 memory.
- The H100's Transformer Engine enables FP8 training of BLOOM-176B with 40% less memory than FP16.
- An H100 achieves 989 TFLOPS at FP8 Tensor Core for sparse operations, used in models like Stable Diffusion 3.
Related terms
Latest news mentioning H100
- Cerebra's Tokenomics Bet: AWS, OpenAI Deals and Wafer-Scale Edge
Cerebra's tokenomics pricing and AWS/OpenAI partnerships challenge NVIDIA's inference dominance, offering a 5x cost reduction per token via its wafer-scale architecture.
May 13, 2026 - NHN Deploys 7,656-GPU AI Cluster in Seoul
NHN launched a 7,656-GPU cluster in Seoul, South Korea, for domestic enterprise AI workloads. The cluster targets inference and training, competing with Naver and Kakao.
May 13, 2026 - AMD Gives OSS Maintainers $3.6M MI355X Cluster Access
AMD gives vLLM/SGLang maintainers $3.6M MI355X cluster access, ending NVIDIA's monopoly on OSS inference hardware access.
May 13, 2026 - Cerebras Understates On-Chip SRAM by 8x, SemiAnalysis Notes
Cerebras understates on-chip SRAM by 8x per SemiAnalysis, a rare under-specification in chip marketing.
May 7, 2026 - Anthropic Doubles Claude Code Rate Limits, Leases All of SpaceX's Colossus 1
Anthropic doubled Claude Code's 5-hour rate limits and removed peak-hour throttling for Pro, Max, Team, and seat-based Enterprise plans, then disclosed the source of the new capacity: a lease on the e
May 6, 2026
FAQ
What is H100?
H100 is NVIDIA's Hopper-architecture GPU for AI and HPC, featuring 80 GB HBM3 memory, 3.35 TB/s bandwidth, and Transformer Engine for mixed-precision training.
How does H100 work?
The NVIDIA H100 GPU, based on the Hopper architecture announced in March 2022 and shipping in late 2022, is the dominant accelerator for large-scale AI training and inference as of 2026. It succeeds the A100 and introduces several key innovations. The H100 is fabricated on a custom TSMC 4N process (5nm-class) and contains 80 billion transistors. Its core specifications include…
Where is H100 used in 2026?
Meta trained Llama 3.1 405B on 16,000 H100 GPUs using the Meta Research SuperCluster. OpenAI's GPT-4 inference reportedly uses H100 clusters with FP8 quantization for real-time chat. NVIDIA's DGX H100 system contains 8 H100 SXM GPUs with 640 GB total HBM3 memory.