Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Compute Lab

AI data centers, power, GPUs and the build-out of frontier compute.

Live Technical IntelligenceFrom SemiAnalysis · The Next Platform · DCD · HPCwire

AI Data Centers.
Build them. Operate them. Understand them.

Stargate is racing to 7 GW across five new sites. Meta's 5 GW Hyperion in Louisiana spans 11 buildings. xAI's Colossus 2 is targeting 1 million GPUs. Microsoft Fairwater is projected past $100 B. The bottleneck is no longer money or chips — it’s power, grid interconnects, and gas-turbine wait times. This hub is everything we know about the build-out: a 12-lesson technical curriculum, a 205-term glossary, hands-on calculators, named-campus case studies, and live news from the people actually building it.

$29B
Per gigawatt
Typical capex for a 1 GW AI campus (2026)
5 GW
Largest AI campus
Meta Hyperion, Louisiana — 11 buildings
1.6 GW+
xAI Colossus 2
Memphis · 550K-1M GPUs targeted by 2026
1.10 PUE
Hyperscale target
Microsoft Fairwater · closed-loop liquid cooling
Compute Lab brief · Jun 2

What changed this week

Auto-generated from the Lab’s knowledge graph. Findings are produced every 12 hours by our agentic research pipeline.

The 12-Lesson Curriculum

From "what's a rack" to "design a 100MW AI campus" — learn at your own pace.

01Beginner

01 · Data Center Fundamentals

What a data center actually is, the four-layer Tier classification (Uptime Institute), the components inside a single rack, and why AI changed everything.

12 min read · 4 diagramsRead →
02Beginner

02 · Power Infrastructure

From the utility substation to the chip: high-voltage interconnects, UPS systems, generators, PDUs, and the 100MW+ scale that AI demands.

15 min read · 5 diagramsRead →
03Intermediate

03 · Cooling Systems

Air, liquid, immersion. CRAC vs CDU, direct-liquid cooling for GPUs, PUE/WUE math, and why every modern AI rack is liquid-cooled.

14 min read · 6 diagramsRead →
04Intermediate

04 · Compute & Accelerators

NVIDIA H100/H200/B200/GB200 NVL72, AMD MI300X, Google TPU v5p, AWS Trainium2, Cerebras WSE-3. Real specs, real interconnects.

18 min read · 5 diagramsRead →
05Intermediate

05 · Network Fabric

InfiniBand vs Ethernet (Ultra Ethernet Consortium), NVLink/NVSwitch, optical transceivers, CLOS topology, and rail-optimized layouts.

16 min read · 5 diagramsRead →
06Intermediate

06 · Storage Architecture

Parallel filesystems (Lustre, WekaFS, VAST), NVMe-oF, checkpoint strategies, and how 100k-GPU clusters move 100GB/s.

13 min read · 3 diagramsRead →
07Advanced

07 · Software & Orchestration

SLURM vs Kubernetes for AI, Run.AI, NVIDIA Base Command, gang scheduling, fault tolerance, and the orchestration stack on top of bare metal.

14 min read · 3 diagramsRead →
08Advanced

08 · How to Build One

Site selection, permitting, 18-36 month construction timelines, vendor selection, and the realistic capex of a 100MW AI campus.

20 min read · 4 diagramsRead →
09Advanced

09 · Operating a Data Center

DCIM, BMS, capacity planning, incident response, the day-to-day of running a critical facility at 99.99% uptime.

13 min read · 3 diagramsRead →
10Intermediate

10 · Sustainability

PUE/WUE/CUE, hyperscaler net-zero pledges, geothermal partnerships, heat reuse for district heating, water positivity.

12 min read · 3 diagramsRead →
11Advanced

11 · Economics & Financing

$/MW capex, opex breakdown, neocloud business models (CoreWeave, Lambda, Crusoe), depreciation cycles, and why CapEx is exploding.

14 min read · 3 diagramsRead →
12Beginner

12 · Careers & How to Become an Expert

Roles, salaries, certifications (Uptime Institute CDCP/CDCS/CDCE, BICSI RCDD), training programs, and the career ladder.

11 min read · 2 diagramsRead →
🧠 Living Agent · Autonomous

Live AI Infra Intelligence

Refreshed every 12h by the Living Agent — analyzing fresh DC news for operator velocity, tech trends, and weekly shifts. Last updated 7h ago.

📝 What changed this week

  • CoreWeave receives first Nvidia Vera Rubin NVL72 rack — Dell ships production unit ahead of schedule, signaling Nvidia’s accelerated next-gen platform transition. Implication: Vera Rubin NVL72 may compress Blackwell’s lifespan, altering hyperscale procurement cycles and secondary-market GPU availability.

  • xAI abandons JAX, builds custom C training framework — Decision driven by sub-10% Model FLOPs Utilization (MFU) on existing stacks. Implication: Major hyperscaler is now vertically integrating training software, potentially fragmenting the ML compiler ecosystem and pressuring JAX/PyTorch maintainers to close performance gaps.

  • Blackwell NVLink breaks confidential compute, 61% regression — NVLink integrity checks expose large overhead when enabling TEE. Implication: Multi-tenant GPU clusters for sensitive workloads (finance, healthcare) face severe throughput penalties, possibly delaying Blackwell adoption in regulated verticals.

  • Huawei achieves 1.5µm bond pitch on Kirin 2026 — Beats TSMC’s 1.8µm in hybrid bonding. Implication: Chinese AI chip supply chain may bypass leading-edge lithography constraints via advanced packaging, threatening Nvidia’s domestic market share and prompting export control revisions.

  • ERCOT data center requests exceed grid capacity by 5x — Texas grid faces unprecedented interconnection backlog. Implication: Hyperscalers may pivot to on-site generation (gas turbines, small modular reactors) or shift buildout to less constrained regions, reshaping DC site-selection economics.

  • AI data centers hit 2M gallons per day per campus water wall — Cooling demand strains municipal water supplies. Implication: Operators face regulatory pushback and rising water costs; adoption of direct-to-chip liquid cooling and closed-loop systems becomes a competitive necessity, not an option.

🚀 Trending hardware/tech

    GB200 NVL72×1B200×1Cerebras WSE-3×1

Curated by an autonomous agent reading live RSS + entity mentions. Rankings reflect actual coverage frequency, not editorial choice.

Latest Technical News

Filtered for substance: hardware specs, topology, MW, MFU. Press fluff demoted.

Jensen Huang onstage at ComputeX 2026, gesturing to a blank screen as disappointed attendees check phones, no AI…
Opinion & Analysis
82

SemiAnalysis Calls Jensen ComputeX Keynote 'F Tier' Over No AI DC News

SemiAnalysis rated Jensen Huang's ComputeX keynote 'F Tier' for no AI datacenter news and revealed a delayed NVIDIA ARM chip with broken video output.

x.com/20h ago/3 min read
arm chipsai hardwarenvidia
Dell employee stands next to a tall, densely packed server rack labeled Nvidia Vera Rubin NVL72, being loaded for…
Products & Launches
100

Dell Ships First Nvidia Vera Rubin NVL72 Rack to CoreWeave

Dell delivered the first Nvidia Vera Rubin NVL72 rack to CoreWeave. Each rack packs 72 Rubin GPUs, 36 Vera CPUs, 3.6 exaFLOPS FP4 inference, 75 TB memory, and 260 TB/s NVLink bandwidth.

x.com/1d ago/3 min read/Widely Reported
dellhardwarecoreweave
A close-up of an NVIDIA Blackwell GPU with NVLink connectors, paired with a performance chart showing a 61%…
AI Research
100

Blackwell NVLink Breaks Confidential Compute, 61% Regression Reported

NVIDIA Blackwell confidential computing disables NVLink multicast, causing 61% regression on SGLang Qwen3.5 397B. Hopper had unencrypted NVLink, compounding the issue.

x.com/3d ago/3 min read/Multi-Source
ai inferencehardwaresecurity
Aerial view of a large Texas power substation with transformers and transmission lines under a clear blue sky
Products & Launches
87

ERCOT datacenter requests exceed grid capacity by 5x

ERCOT datacenter requests far exceed grid underwriting capacity, per @SemiAnalysis_, revealing grid approval as a binding constraint on AI infrastructure buildout.

x.com/3d ago/3 min read
ai infrastructuredata centersenergy
A large data center building with condensation on its glass windows, surrounded by dry landscape, highlighting the…
Policy & Ethics
85

AI Data Centers Hit Water Wall: 2M Gallons Per Day Per Campus

Water capacity is now a siting gatekeeper for AI data centers. A Virginia campus requested 2M gallons per day; Georgia told a 6 MGD project 'we just don't have the water.'

datacenterknowledge.com/3d ago/3 min read/Widely Reported
ai infrastructurecoolingdata centers
Two business executives shaking hands in a modern glass-walled office, with a digital server rack and glowing…
Products & Launches
85

Google and Blackstone Launch TPU Venture, Challenging Nvidia Dominance

Google and Blackstone launched a TPU venture, financing AI infrastructure outside the hyperscale cloud model. Enterprise buyers get a standalone alternative to Nvidia-dominated GPU clusters.

news.google.com/May 21, 2026/3 min read/Widely Reported
ai infrastructurehardwarecloud computing

Top AI Data Center Operators

Hyperscalers, neoclouds, and colocation providers powering frontier AI.

OperatorTypePower (MW)Notable SiteSpecialty
Microsoft AzureHyperscaler5,000Fairwater, WI · projected $100B+ buildOpenAI compute partner · GB200 at scale
Google CloudHyperscaler4,500Council Bluffs, IA · The Dalles, OR · Kronstorf, ATTPU pods + Gemini training · 5GW Anthropic deal
Amazon AWSHyperscaler4,000Project Rainier, New Carlisle IN · 2.2 GWTrainium2/3 powering Anthropic
MetaHyperscaler5,000Hyperion, Richland Parish LA · 5 GW · 11 buildingsLlama + MTIA · Prometheus, Ohio coming May 2026
OpenAI / StargateHyperscaler7,000Abilene, TX · 1.2 GW by mid-2026 · 5 new sites announcedOracle + SoftBank · 10 GW by 2027 · ~$400B committed
xAI / SpaceXHyperscaler1,600Colossus 2, Memphis TN · 550K-1M GPUs · Colossus 1 leased to AnthropicGrok training · Vera Rubin roadmap
AnthropicHyperscaler5,300Project Rainier (AWS) · Colossus 1 (SpaceX) · FluidstackMulti-vendor compute · 5 stacked deals
CoreWeaveNeocloud1,300Plano, TX · multiple sitesGPU-as-a-service, NVIDIA partner
EquinixColocation1,500Global · 260+ data centersInterconnection + colo
Digital RealtyColocation2,700Global · 300+ data centersWholesale + hyperscale colo
LambdaNeocloud200Allen, TX · expandingGPU cloud, on-demand H100/H200
Crusoe EnergyNeocloud1,200Abilene, TX (Stargate Phase 1)Stranded-gas powered AI infra

MW figures are publicly disclosed AI-dedicated capacity, current or planned. Updated continuously from press releases, permit filings, and infrastructure analysis.

Operator momentum

Top 8 DC operators by mention growth this week. Sourced from the Compute Lab brief.

  1. 01Nvidia
    4 mentions
  2. 02Intel
    2 mentions
  3. 03CoreWeave
    1 mentions
  4. 04xAI
    1 mentions
✨ NEW · Interactive

Stop reading. Start designing.

The Data Center Designer simulator. Pick scale (1 MW → 2.5 GW), GPU (H100 / B200 / GB200 NVL72 / MI300X / TPU / Trainium), cooling, location, tier — see the real capex, opex, PUE, training throughput, build timeline, and CO₂ math your design produces. Six presets matched to real projects (Stargate, Hyperion, Project Rainier).

Open the Designer →+ Cloud lab pathway · First exercise <$5

Named campuses, decoded

Deep dives on the real gigawatt-scale projects — verified specs, cited sources, strategic analysis.

Hands-on tools

Every tool is interactive, browser-only, no signup. Built to teach by doing.

Who is this for?

We have tailored reading paths for 4 audiences. Pick yours.

Speak the language

From PUE to NVLink — the vocabulary you need to read any data center paper.

PUEPower Usage Effectiveness

Total facility power ÷ IT equipment power. 1.0 = perfect, 1.10 = hyperscale, 1.5+ = enterprise.

DLCDirect Liquid Cooling

Coolant flows through cold plates touching the chip. Required for Blackwell-class racks above 70kW.

NVLinkNVIDIA's GPU interconnect

5th gen on Blackwell: 1.8 TB/s bidirectional per GPU. Connects GPUs into a single-memory domain.

InfiniBandHigh-speed scale-out fabric

NDR = 400 Gbps, XDR = 800 Gbps per port. Dominates scale-out networks for AI training.

HBMHigh Bandwidth Memory

Stacked DRAM next to the GPU die. H100 = HBM3 (3.35 TB/s), B200 = HBM3e (8 TB/s).

MFUModel FLOPs Utilization

% of theoretical peak FLOPs your training run actually achieves. 50%+ is great.

CDUCoolant Distribution Unit

Heat exchanger between the rack's liquid loop and the facility loop. Sits at row or rack level.

Tier IVUptime Institute rating

Fault-tolerant: every component is redundant + concurrently maintainable. 99.995% uptime target.

GB200 NVL72NVIDIA Blackwell rack-scale system

72 B200 GPUs + 36 Grace CPUs in one liquid-cooled rack. ~120 kW. 1.4 EFLOPS FP4.

Become an expert

Real-world courses and certifications used by the people building the largest AI clusters on Earth.

EPI (EXIN-accredited)~$2,000–2,500

Certified Data Centre Professional (CDCP®)

The de-facto entry credential for data center facilities. EXIN-accredited, valid 3 years. 40-question exam (27/40 to pass). Delivered in 50+ countries via partners.

2 days + 1-hour exam · Beginner
Uptime Institute$4,985

Accredited Tier Designer (ATD)

The credential for designing Tier-rated facilities. PE licence (or equivalent) required. What hyperscalers and MEP firms actually look for.

16 hours over 5 half-days + proctored exam · Advanced
NVIDIAFree audits + paid workshops (~$500)

NVIDIA Deep Learning Institute (DLI)

CUDA, multi-node training, NCCL, Base Command. Paid courses include live GPU labs. Maps to NCA-AIIO ($125) and NCP-AII ($400) certifications.

Self-paced + 8h instructor-led · All levels
BICSI$510 member / $725 non-member (exam)

Data Center Design Consultant (DCDC®)

BICSI's data-center–specific design credential. 100 questions, drag-and-drop + multiple choice. Requires RCDD or 3 years DC experience. Pearson VUE delivered.

Self-study + 2-hour computer-based exam · Advanced
Schneider ElectricFree

Schneider Electric University (formerly DCU)

200+ vendor-neutral modules on power, cooling, racks, design, sustainability. CPD-accredited. Optional DCCA (Data Center Certified Associate) exam.

Self-paced, ~1h modules · Beginner
Open Compute ProjectFree

OCP Academy

Official learning platform for Open Compute Project specs. Modules include 'Open Systems for AI' (6-part series), Open Rack ORv3, and OCP-Recognized Equipment.

Self-paced · Advanced

Connect the labs

Compute Lab is the steel. Click through to see what runs on it.

Frequently asked questions

What is an AI data center, and how is it different from a traditional one?
An AI data center is a facility purpose-built to train and serve large AI models. Unlike traditional cloud or enterprise data centers, AI sites are GPU-dense (NVIDIA H100/H200/B200, AMD MI300X), wired with InfiniBand or NVLink fabrics for ultra-low-latency all-reduce traffic, and almost always liquid-cooled because rack densities exceed 100 kW. They also draw far more power per rack — frontier campuses now run at 100 MW–1 GW, comparable to small cities — and prioritize uptime under sustained training loads measured in weeks.
How much power does an AI data center use?
Modern AI training campuses run between 100 MW and 1 GW. NVIDIA's GB200 NVL72 rack alone draws ~120 kW. A 100 MW AI facility supports roughly 800–1,000 such racks. Stargate (OpenAI/Oracle/SoftBank) targets 5+ GW across multiple sites. Hyperion (Meta), Project Rainier (Amazon), and xAI's Colossus are all in the multi-hundred-MW class. The IEA forecasts global data-center electricity demand could double by 2030 if AI growth continues at the current pace, with AI representing ~40% of the increase.
Why do AI data centers need liquid cooling?
Air cooling tops out at roughly 30–40 kW per rack. AI racks like NVL72 and B200 reference designs go beyond 100 kW. Liquid carries heat ~3,000× more efficiently than air per unit volume, so direct-to-chip cold plates and immersion are the only practical options at GPU densities above 50 kW. Liquid cooling also lets operators tighten PUE (Power Usage Effectiveness) toward 1.1, recovering a meaningful fraction of waste heat. Vertiv, Schneider Electric, and CoolIT supply most of the modern AI cooling stack.
What is InfiniBand, and why does it matter for AI training?
InfiniBand is a low-latency, high-bandwidth networking fabric (NVIDIA Quantum-2 = 400 Gb/s; Quantum-X800 = 800 Gb/s) used to connect GPUs across thousands of nodes for synchronous training. AI models train via all-reduce gradients across the cluster every step — a slow fabric throttles the entire run. Ethernet (Ultra Ethernet Consortium, NVIDIA Spectrum-X) is closing the gap, but InfiniBand still dominates frontier training clusters because of predictable tail latency under congestion.
Who are the top AI data center operators in 2026?
Hyperscalers: Microsoft Azure, Amazon AWS, Google Cloud, Meta. Neoclouds (GPU-focused): CoreWeave, Crusoe, Lambda, Nebius, Voltage Park. Sovereign/regional: G42 (UAE), Stargate (US), Mistral/Iliad (EU), Yotta (India). Colocation: Equinix, Digital Realty, Vantage. The neocloud category exploded in 2025-2026 as AI labs sought bare-metal H200/B200 capacity outside the hyperscaler queue. Frontier compute increasingly lives in greenfield builds near cheap power (Texas, Iowa, the Nordics, the Gulf).
Is it true AI will exceed grid capacity?
In several U.S. and EU regions it already has. ERCOT (Texas), PJM (Mid-Atlantic), and Ireland's EirGrid have all paused or capped new data-center interconnects. Operators are responding with on-site gas generation, behind-the-meter solar+battery, and direct PPAs with new nuclear (Microsoft–Constellation Three Mile Island, Amazon–Talen, Google–Kairos SMRs). The bottleneck is not silicon — it's substations, transformers, and transmission. Grid queue times of 4–7 years now exceed GPU lead times.
How do I learn how AI data centers actually work?
Start with our 12-lesson curriculum at /ai-data-centers/learn — it covers racks, power, cooling, networking, and capacity planning, ending with a 100 MW campus design exercise. The /ai-data-centers/glossary covers 200+ terms (PUE, NVL72, Tier ratings, MFU, BBU). For deeper reading, follow SemiAnalysis, The Next Platform, Data Center Dynamics (DCD), and HPCwire — these are also the sources we ingest live for the news section on this page.

Get the data center briefing

Weekly: only the technical news that matters. New papers, MW build-outs, topology decisions, hardware drops.

Subscribe →