Tool · Capacity Planning

Power, cooling, space. Pick the bottleneck.

Real facilities die from one of three constraints first. You set the limits, you add workloads, you watch which one binds. Then you learn how to fix it without wasting the other two.

🏭 Facility limits (your data hall)

📦 Add workloads

H100 server (8 GPU)

8kW · 6U

Standard 8× H100 SXM in 6U chassis. Air-coolable up to ~70 kW/rack.

= 0 kW · 0U

B200 server (8 GPU)

11kW · 6U

Blackwell. Liquid cooled. Higher power but better perf/watt.

= 0 kW · 0U

GB200 NVL72 rack

120kW · 42U

Full rack. 72 B200 GPUs in one liquid-cooled domain.

= 0 kW · 0U

Storage server (1U NVMe)

0.5kW · 1U

All-flash storage node. Minimal heat.

= 0 kW · 0U

ToR switch (InfiniBand 400G)

1.5kW · 1U

Top-of-rack 32-port spine. Hot but small footprint.

= 0 kW · 0U

CPU server (2U dual-socket)

0.8kW · 2U

General-purpose head-node, control plane, scheduler hosts.

= 0 kW · 0U

📊 Capacity utilization

Power0 / 2,000 kW (0%)
Cooling0 / 1,800 kW (0%)
Space (U)0 / 420 U (0%)

📐 Binding constraint: Power

Your power is at 0% — when you add more, this is the dimension that runs out first.

Stranded capacity in Cooling

100% unused

0% utilized vs 0% on binding constraint

Stranded capacity in Space (U)

100% unused

0% utilized vs 0% on binding constraint

📚 What is "stranded capacity"?

A facility has three independent constraints: power, cooling, and space. They don't deplete at the same rate — one always binds first. Whatever capacity you have in the OTHER two dimensions is "stranded" — physically there but unusable.

Real DC operators obsess over reducing stranded capacity. Common tactics:

  • Mix workload types to balance density (storage stretches power; GPUs stretch space)
  • Upgrade only the binding dimension (add chillers if cooling-bound; not racks)
  • Match cooling capacity to actual installed power, not nameplate

💡 Try: fill the facility with H100 servers only. Then with NVL72 racks only. See how the binding constraint switches.