AI Data Centers.
Build them. Operate them. Understand them.
A new generation of facilities — 100MW+ campuses, liquid-cooled GPU racks, InfiniBand fabrics — powers every frontier AI model. This hub is everything we know about that infrastructure: a 12-lesson technical curriculum, a 130+ term glossary, curated courses, and live news from the people actually building it.
The 12-Lesson Curriculum
From "what's a rack" to "design a 100MW AI campus" — learn at your own pace.
01 · Data Center Fundamentals
What a data center actually is, the four-layer Tier classification (Uptime Institute), the components inside a single rack, and why AI changed everything.
02 · Power Infrastructure
From the utility substation to the chip: high-voltage interconnects, UPS systems, generators, PDUs, and the 100MW+ scale that AI demands.
03 · Cooling Systems
Air, liquid, immersion. CRAC vs CDU, direct-liquid cooling for GPUs, PUE/WUE math, and why every modern AI rack is liquid-cooled.
04 · Compute & Accelerators
NVIDIA H100/H200/B200/GB200 NVL72, AMD MI300X, Google TPU v5p, AWS Trainium2, Cerebras WSE-3. Real specs, real interconnects.
05 · Network Fabric
InfiniBand vs Ethernet (Ultra Ethernet Consortium), NVLink/NVSwitch, optical transceivers, CLOS topology, and rail-optimized layouts.
06 · Storage Architecture
Parallel filesystems (Lustre, WekaFS, VAST), NVMe-oF, checkpoint strategies, and how 100k-GPU clusters move 100GB/s.
07 · Software & Orchestration
SLURM vs Kubernetes for AI, Run.AI, NVIDIA Base Command, gang scheduling, fault tolerance, and the orchestration stack on top of bare metal.
08 · How to Build One
Site selection, permitting, 18-36 month construction timelines, vendor selection, and the realistic capex of a 100MW AI campus.
09 · Operating a Data Center
DCIM, BMS, capacity planning, incident response, the day-to-day of running a critical facility at 99.99% uptime.
10 · Sustainability
PUE/WUE/CUE, hyperscaler net-zero pledges, geothermal partnerships, heat reuse for district heating, water positivity.
11 · Economics & Financing
$/MW capex, opex breakdown, neocloud business models (CoreWeave, Lambda, Crusoe), depreciation cycles, and why CapEx is exploding.
12 · Careers & How to Become an Expert
Roles, salaries, certifications (Uptime Institute CDCP/CDCS/CDCE, BICSI RCDD), training programs, and the career ladder.
Live AI Infra Intelligence
Refreshed every 12h by the Living Agent — analyzing fresh DC news for operator velocity, tech trends, and weekly shifts. Last updated 5 min ago.
📝 What changed this week
- Capital & Debt Markets Surge: Google and CoreWeave raised a record $5.7B in junk bonds, signaling intense investor appetite for funding AI data center build-out despite high costs and interest rates.
- Supply Chain & Build-Out Risks Intensify: Satellite analysis indicates 40% of planned 2026 AI data centers face delays, compounding pressure from the $250-300B annual spend required, now equated to 5-7 Manhattan Projects.
- Chip & Interconnect Partnerships Deepen: Nvidia invested $2B in Marvell to scale NVLink Fusion, while the IOWN Forum is advancing all-photonic WANs, both targeting critical bottlenecks in inter-GPU and inter-data center communication.
- Infrastructure Operators Expand & Monetize: Elice Group is scaling with modular data centers and planning an IPO, as new models (xAI's 0.5T-10T param Colossus 2) drive demand, with 38% of Americans now living within 5 miles of an operational facility.
Curated by an autonomous agent reading live RSS + entity mentions. Rankings reflect actual coverage frequency, not editorial choice.
Latest Technical News
Filtered for substance: hardware specs, topology, MW, MFU. Press fluff demoted.
Top AI Data Center Operators
Hyperscalers, neoclouds, and colocation providers powering frontier AI.
| Operator | Type | Power (MW) | Notable Site | Specialty |
|---|---|---|---|---|
| Microsoft Azure | Hyperscaler | 5,000 | Mt. Pleasant, WI · Quincy, WA | OpenAI compute partner |
| Google Cloud | Hyperscaler | 4,500 | Council Bluffs, IA · The Dalles, OR | TPU pods + Gemini training |
| Amazon AWS | Hyperscaler | 4,000 | Project Rainier (Anthropic) · 2.2GW | Trainium2/3 + GPU clusters |
| Meta | Hyperscaler | 3,500 | Hyperion, Richland Parish LA · 2GW | Llama training, custom MTIA |
| xAI | Hyperscaler | 750 | Colossus, Memphis TN · 200MW phase 1 | Grok training, 100k+ H100 |
| CoreWeave | Neocloud | 1,300 | Plano, TX · multiple sites | GPU-as-a-service, NVIDIA partner |
| Equinix | Colocation | 1,500 | Global · 260+ data centers | Interconnection + colo |
| Digital Realty | Colocation | 2,700 | Global · 300+ data centers | Wholesale + hyperscale colo |
| Lambda | Neocloud | 200 | Allen, TX · expanding | GPU cloud, on-demand H100/H200 |
| Crusoe Energy | Neocloud | 1,200 | Abilene, TX (Stargate Phase 1) | Stranded-gas powered AI infra |
MW figures are publicly disclosed AI-dedicated capacity, current or planned. Updated continuously from press releases, permit filings, and infrastructure analysis.
Stop reading. Start designing.
The Data Center Designer simulator. Pick scale (1 MW → 2.5 GW), GPU (H100 / B200 / GB200 NVL72 / MI300X / TPU / Trainium), cooling, location, tier — see the real capex, opex, PUE, training throughput, build timeline, and CO₂ math your design produces. Six presets matched to real projects (Stargate, Hyperion, Project Rainier).
Speak the language
From PUE to NVLink — the vocabulary you need to read any data center paper.
Total facility power ÷ IT equipment power. 1.0 = perfect, 1.10 = hyperscale, 1.5+ = enterprise.
Coolant flows through cold plates touching the chip. Required for Blackwell-class racks above 70kW.
5th gen on Blackwell: 1.8 TB/s bidirectional per GPU. Connects GPUs into a single-memory domain.
NDR = 400 Gbps, XDR = 800 Gbps per port. Dominates scale-out networks for AI training.
Stacked DRAM next to the GPU die. H100 = HBM3 (3.35 TB/s), B200 = HBM3e (8 TB/s).
% of theoretical peak FLOPs your training run actually achieves. 50%+ is great.
Heat exchanger between the rack's liquid loop and the facility loop. Sits at row or rack level.
Fault-tolerant: every component is redundant + concurrently maintainable. 99.995% uptime target.
72 B200 GPUs + 36 Grace CPUs in one liquid-cooled rack. ~120 kW. 1.4 EFLOPS FP4.
Become an expert
Real-world courses and certifications used by the people building the largest AI clusters on Earth.
Certified Data Centre Professional (CDCP)
Industry-standard intro covering site selection, racks, power, cooling, fire suppression. Required reading for new DC operators.
Accredited Tier Designer (ATD)
Official Uptime certification for designing Tier I-IV facilities. The credential hyperscalers actually look for.
NVIDIA Deep Learning Institute
Hands-on courses on GPU clusters, multi-node training, NVIDIA NIM, CUDA fundamentals. Many free.
Data Center Engineering Specialist (CDES)
Cabling-focused but covers full DC engineering. Required for many mission-critical roles.
Schneider Electric Data Center University
Free vendor-neutral courses on power, cooling, racks, design. Great starting point — no marketing pitch.
Open Compute Project (OCP) Specs & Tracks
Open hardware specifications used by Meta, Microsoft, Google. Read what hyperscalers actually deploy.
Get the data center briefing
Weekly: only the technical news that matters. New papers, MW build-outs, topology decisions, hardware drops.