Game · Power Budget
Constraints first. Excess optional.
You're a chief infrastructure officer. You have power. You have budget. You have a deadline. Pick a frontier-model training scenario and design a cluster that meets all three constraints. Score = win/lose with explanations.
🎯 Pick a scenario
Mission constraints
200 MW
Power available
$2.0B
Capex budget
90 days
Time limit
💡 Llama-3.1 405B trained on 15.6T tokens ≈ 3.8×10²⁵ FLOPs (6·N·D). At 40% MFU you need ~4.9 EFLOPS sustained for 90 days.
❌ Mission failed
Power used
4.8 MW
Total capex
$157M
Training time
97.7 d
Sustained EFLOPS
4.50
Why your design failed
- • Training takes 97.7 days, target is 90. Need more GPUs or faster ones.
The trick: training time is governed by sustained FLOPs ÷ workload size. MFU (Model FLOPs Utilization) of 40% is realistic — code, communication, and restarts eat 60% of theoretical peak. Power follows GPU TDP × cooling PUE. Capex is dominated by silicon — bigger GPU counts → exponential cost. Win condition usually requires balancing: NOT just maximizing GPUs, but choosing efficient ones (B200 wins per-watt vs H100).