Live Technical IntelligenceFrom SemiAnalysis · The Next Platform · DCD · HPCwire

AI Data Centers.
Build them. Operate them. Understand them.

A new generation of facilities — 100MW+ campuses, liquid-cooled GPU racks, InfiniBand fabrics — powers every frontier AI model. This hub is everything we know about that infrastructure: a 12-lesson technical curriculum, a 130+ term glossary, curated courses, and live news from the people actually building it.

120 kW
Per AI rack
GB200 NVL72 spec; vs 5-10kW traditional
1.4 EF
Per NVL72 rack
FP4 inference (NVIDIA Blackwell whitepaper)
100 MW+
Modern AI campus
vs 5-15MW traditional enterprise DC
1.10 PUE
Hyperscale target
vs 1.5+ enterprise (lower is better)

The 12-Lesson Curriculum

From "what's a rack" to "design a 100MW AI campus" — learn at your own pace.

01Beginner

01 · Data Center Fundamentals

What a data center actually is, the four-layer Tier classification (Uptime Institute), the components inside a single rack, and why AI changed everything.

12 min read · 4 diagramsRead →
02Beginner

02 · Power Infrastructure

From the utility substation to the chip: high-voltage interconnects, UPS systems, generators, PDUs, and the 100MW+ scale that AI demands.

15 min read · 5 diagramsRead →
03Intermediate

03 · Cooling Systems

Air, liquid, immersion. CRAC vs CDU, direct-liquid cooling for GPUs, PUE/WUE math, and why every modern AI rack is liquid-cooled.

14 min read · 6 diagramsRead →
04Intermediate

04 · Compute & Accelerators

NVIDIA H100/H200/B200/GB200 NVL72, AMD MI300X, Google TPU v5p, AWS Trainium2, Cerebras WSE-3. Real specs, real interconnects.

18 min read · 5 diagramsRead →
05Intermediate

05 · Network Fabric

InfiniBand vs Ethernet (Ultra Ethernet Consortium), NVLink/NVSwitch, optical transceivers, CLOS topology, and rail-optimized layouts.

16 min read · 5 diagramsRead →
06Intermediate

06 · Storage Architecture

Parallel filesystems (Lustre, WekaFS, VAST), NVMe-oF, checkpoint strategies, and how 100k-GPU clusters move 100GB/s.

13 min read · 3 diagramsRead →
07Advanced

07 · Software & Orchestration

SLURM vs Kubernetes for AI, Run.AI, NVIDIA Base Command, gang scheduling, fault tolerance, and the orchestration stack on top of bare metal.

14 min read · 3 diagramsRead →
08Advanced

08 · How to Build One

Site selection, permitting, 18-36 month construction timelines, vendor selection, and the realistic capex of a 100MW AI campus.

20 min read · 4 diagramsRead →
09Advanced

09 · Operating a Data Center

DCIM, BMS, capacity planning, incident response, the day-to-day of running a critical facility at 99.99% uptime.

13 min read · 3 diagramsRead →
10Intermediate

10 · Sustainability

PUE/WUE/CUE, hyperscaler net-zero pledges, geothermal partnerships, heat reuse for district heating, water positivity.

12 min read · 3 diagramsRead →
11Advanced

11 · Economics & Financing

$/MW capex, opex breakdown, neocloud business models (CoreWeave, Lambda, Crusoe), depreciation cycles, and why CapEx is exploding.

14 min read · 3 diagramsRead →
12Beginner

12 · Careers & How to Become an Expert

Roles, salaries, certifications (Uptime Institute CDCP/CDCS/CDCE, BICSI RCDD), training programs, and the career ladder.

11 min read · 2 diagramsRead →
🧠 Living Agent · Autonomous

Live AI Infra Intelligence

Refreshed every 12h by the Living Agent — analyzing fresh DC news for operator velocity, tech trends, and weekly shifts. Last updated 5 min ago.

📝 What changed this week

  • Capital & Debt Markets Surge: Google and CoreWeave raised a record $5.7B in junk bonds, signaling intense investor appetite for funding AI data center build-out despite high costs and interest rates.
  • Supply Chain & Build-Out Risks Intensify: Satellite analysis indicates 40% of planned 2026 AI data centers face delays, compounding pressure from the $250-300B annual spend required, now equated to 5-7 Manhattan Projects.
  • Chip & Interconnect Partnerships Deepen: Nvidia invested $2B in Marvell to scale NVLink Fusion, while the IOWN Forum is advancing all-photonic WANs, both targeting critical bottlenecks in inter-GPU and inter-data center communication.
  • Infrastructure Operators Expand & Monetize: Elice Group is scaling with modular data centers and planning an IPO, as new models (xAI's 0.5T-10T param Colossus 2) drive demand, with 38% of Americans now living within 5 miles of an operational facility.

🏆 Most-mentioned operators (last 7d)

  1. 1.Nvidia2 mentions
  2. 2.Google1 mention
  3. 3.Microsoft1 mention
  4. 4.CoreWeave1 mention
  5. 5.OpenAI1 mention
  6. 6.Oracle1 mention
  7. 7.xAI1 mention

From 8 DC-relevant articles · auto-ranked

Curated by an autonomous agent reading live RSS + entity mentions. Rankings reflect actual coverage frequency, not editorial choice.

Latest Technical News

Filtered for substance: hardware specs, topology, MW, MFU. Press fluff demoted.

Indexing data center articles… check back in a few hours as the new RSS feeds catch up.

Top AI Data Center Operators

Hyperscalers, neoclouds, and colocation providers powering frontier AI.

OperatorTypePower (MW)Notable SiteSpecialty
Microsoft AzureHyperscaler5,000Mt. Pleasant, WI · Quincy, WAOpenAI compute partner
Google CloudHyperscaler4,500Council Bluffs, IA · The Dalles, ORTPU pods + Gemini training
Amazon AWSHyperscaler4,000Project Rainier (Anthropic) · 2.2GWTrainium2/3 + GPU clusters
MetaHyperscaler3,500Hyperion, Richland Parish LA · 2GWLlama training, custom MTIA
xAIHyperscaler750Colossus, Memphis TN · 200MW phase 1Grok training, 100k+ H100
CoreWeaveNeocloud1,300Plano, TX · multiple sitesGPU-as-a-service, NVIDIA partner
EquinixColocation1,500Global · 260+ data centersInterconnection + colo
Digital RealtyColocation2,700Global · 300+ data centersWholesale + hyperscale colo
LambdaNeocloud200Allen, TX · expandingGPU cloud, on-demand H100/H200
Crusoe EnergyNeocloud1,200Abilene, TX (Stargate Phase 1)Stranded-gas powered AI infra

MW figures are publicly disclosed AI-dedicated capacity, current or planned. Updated continuously from press releases, permit filings, and infrastructure analysis.

✨ NEW · Interactive

Stop reading. Start designing.

The Data Center Designer simulator. Pick scale (1 MW → 2.5 GW), GPU (H100 / B200 / GB200 NVL72 / MI300X / TPU / Trainium), cooling, location, tier — see the real capex, opex, PUE, training throughput, build timeline, and CO₂ math your design produces. Six presets matched to real projects (Stargate, Hyperion, Project Rainier).

Open the Designer →+ Cloud lab pathway · First exercise <$5

Speak the language

From PUE to NVLink — the vocabulary you need to read any data center paper.

PUEPower Usage Effectiveness

Total facility power ÷ IT equipment power. 1.0 = perfect, 1.10 = hyperscale, 1.5+ = enterprise.

DLCDirect Liquid Cooling

Coolant flows through cold plates touching the chip. Required for Blackwell-class racks above 70kW.

NVLinkNVIDIA's GPU interconnect

5th gen on Blackwell: 1.8 TB/s bidirectional per GPU. Connects GPUs into a single-memory domain.

InfiniBandHigh-speed scale-out fabric

NDR = 400 Gbps, XDR = 800 Gbps per port. Dominates scale-out networks for AI training.

HBMHigh Bandwidth Memory

Stacked DRAM next to the GPU die. H100 = HBM3 (3.35 TB/s), B200 = HBM3e (8 TB/s).

MFUModel FLOPs Utilization

% of theoretical peak FLOPs your training run actually achieves. 50%+ is great.

CDUCoolant Distribution Unit

Heat exchanger between the rack's liquid loop and the facility loop. Sits at row or rack level.

Tier IVUptime Institute rating

Fault-tolerant: every component is redundant + concurrently maintainable. 99.995% uptime target.

GB200 NVL72NVIDIA Blackwell rack-scale system

72 B200 GPUs + 36 Grace CPUs in one liquid-cooled rack. ~120 kW. 1.4 EFLOPS FP4.

Become an expert

Real-world courses and certifications used by the people building the largest AI clusters on Earth.

Get the data center briefing

Weekly: only the technical news that matters. New papers, MW build-outs, topology decisions, hardware drops.

Subscribe →