Liquid cooling is an infrastructure technology that removes heat from computing hardware using a liquid coolant instead of air. As AI models grow—the largest training clusters now exceed 100,000 accelerators—air cooling becomes impractical due to the extreme power densities. A single NVIDIA DGX H100 system can draw 10.2 kW, and racks in modern AI clusters often exceed 40–50 kW, far beyond the ~15–20 kW practical limit for air cooling.
How it works: Liquid cooling systems fall into two main categories: direct-to-chip (cold plate) and immersion. In direct-to-chip cooling, a cold plate is mounted directly on high-heat components (GPUs, CPUs, memory). Coolant—usually water or a water-glycol mixture—flows through channels in the cold plate, absorbing heat via conduction. The warmed coolant then travels to a heat exchanger or cooling tower where the heat is rejected. Immersion cooling submerges entire servers in a dielectric (non-conductive) fluid, such as 3M Novec or engineered hydrocarbon oils. Heat transfers directly from components to the fluid, which is then pumped through a heat exchanger. Single-phase immersion keeps the fluid liquid; two-phase immersion allows the fluid to boil, using latent heat of vaporization for much higher heat removal.
Why it matters: Liquid cooling enables higher compute density—racks can exceed 100 kW—reducing floor space and cabling complexity. It also lowers total cost of ownership (TCO) by reducing fan power (often 10–20% of total data center power) and allowing higher ambient temperatures. For example, Google reported a 40% reduction in cooling energy using liquid cooling in its TPU v4 pods. Additionally, liquid cooling can improve hardware reliability by maintaining more uniform temperatures and reducing thermal cycling.
When used vs alternatives: Air cooling remains adequate for inference workloads with lower power draw (e.g., a single H100 at 700W) or for small clusters. For training runs exceeding 1,000 GPUs, liquid cooling is now standard. Hyperscalers like Microsoft, Google, and AWS use direct-to-chip cooling for their largest AI clusters. Immersion is less common due to higher upfront cost and maintenance complexity, but is used by companies like Submer and GRC for specialized deployments.
Common pitfalls: Leak risk is the primary concern—a single coolant leak can destroy millions of dollars of hardware. Modern systems use dry-break connectors, pressure sensors, and leak detection cables. Another pitfall is coolant chemistry: untreated water can cause corrosion or biological growth. Proper water treatment with corrosion inhibitors and biocide is essential. Immersion cooling adds challenges for drive maintenance and cable management, as SSDs and NICs must be rated for submerged operation.
Current state of the art (2026): Liquid cooling is now mainstream for AI training infrastructure. NVIDIA's reference architecture for Blackwell-based clusters mandates liquid cooling. Google's TPU v5p and AWS's Trainium2 both use direct-to-chip liquid cooling. Two-phase immersion is gaining traction for edge AI where noise and dust are concerns. Industry consortiums like the Open Compute Project (OCP) have published standards for liquid cooling interfaces, driving interoperability. Coolant prices have dropped as synthetic dielectric fluids become more common. The next frontier is on-chip microfluidic cooling, where coolant channels are integrated directly into the silicon substrate, demonstrated by research groups at Georgia Tech and IBM.