Practice

Stop reading. Start designing.

Theory is necessary but not sufficient. Below is an interactive AI data center designer that puts the lessons into your hands. Pick scale, hardware, cooling, redundancy — see the numbers a real design team would see. Below that: cloud labs where you can actually rent GPUs and incident scenarios that walk through real decisions.

🛠️ Data Center Designer

Live calculation — no signup, no save state, just play

Quick presets

⚙️ Configuration

100 MW
1 MW500 MW1 GW2.5 GW

72 B200 + 36 Grace CPUs in one liquid-cooled rack. Single NVLink domain. Cost shown is per-GPU equivalent.

Cold plates on chips. Required for B200/GB200. Standard for modern AI.

Hyperscaler favorite. Cool climate = great PUE. Mid grid mix.

📊 Live results

GPUs
54,347
across ~754 racks
Total capex
$3.65B
capex incl. silicon
Annual opex
$196.9M
$48.2M on power
PUE
1.10
lower = more efficient
EFLOPS (FP8)
244.56
theoretical peak
CO₂/yr
262,800 t
based on grid mix

Capex breakdown

GPUs / accelerators: $2.23B
Facility shell + power: $1.08B
Cooling system: $130.0M
Networking: $217.4M

🚂 Training capability (rough estimate)

Llama-3.1 70B from scratch2.8 hours
Llama-3.1 405B from scratch1.2 days

Assumes 40% MFU. Actual times vary widely with code, communication overhead, restarts.

Realistic time-to-build
~33 months
incl. 30mo interconnect

Calculation notes: PUE math is simplified (cooling baseline × climate factor). Capex includes silicon at street prices. Opex includes power, ~4% maintenance, and minimum staff floor of $2M/yr. Training estimates assume 40% MFU. Real designs require working with mech/electrical engineers — this tool teaches relationships, not blueprints.

☁️ Rent real GPUs by the hour

The fastest way to gain real experience: spin up an actual GPU instance for a few dollars, run a workload, observe what happens. These providers are where the experimentation happens.

Prices are approximate spot/on-demand for an H100 80GB. Always check current pricing — GPU markets shift weekly.

🎯 Your first hands-on exercise (under $5)

  1. Sign up for RunPod or Vast.ai.
  2. Spin up a single H100 80GB spot instance (~$2-3/hour).
  3. SSH in. Run nvidia-smi — confirm the GPU is alive.
  4. Pull the official NVIDIA PyTorch container: docker pull nvcr.io/nvidia/pytorch:25.04-py3
  5. Run a 30-minute Llama 3.1 8B fine-tune from Hugging Face.
  6. Watch GPU util and memory in nvtop. Note your tokens/sec.
  7. Tear it all down. You've spent ~$3 and just done what most people only read about.

🧰 More hands-on tools