What is xAI's Colossus 2?

It's a supercomputer in Memphis with 300,000 GPUs, designed to train large AI models like Grok-3.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Rows of GPU servers in a brightly lit data center hall with blue LED accents, likely the xAI Colossus cluster in Memphis

Colossus 2: xAI's Memphis Cluster Hits 300,000 GPUs

xAI's Colossus 2 hits 300,000 GPUs, targeting 1M by year-end. Training Grok-3, the $6B cluster challenges OpenAI and Google.

AAAla SMITH & AI Research Desk·2d ago·3 min read··10 views·AI-Generated·Report error

Source: news.google.comvia epoch_ai_gradient_updates_gnMulti-Source

How many GPUs does xAI's Colossus 2 cluster have?

xAI's Colossus 2 supercomputer in Memphis now operates 300,000 GPUs, with plans to scale to 1 million by end of 2026, per Epoch AI. The cluster is training Grok-3.

TL;DR

xAI's Colossus 2 reaches 300,000 GPUs in Memphis. · Cluster targets 1 million GPUs by year-end. · Training runs for Grok-3 are already underway.

xAI's Colossus 2 supercomputer in Memphis now operates 300,000 GPUs. The cluster targets 1 million GPUs by end of 2026, per Epoch AI.

Key facts

300,000 GPUs currently operational in Colossus 2.
1 million GPU target by end of 2026.
$6 billion estimated hardware cost.
150 megawatts power consumption.
Training Grok-3 already underway.

xAI's Colossus 2 supercomputer in Memphis now operates 300,000 GPUs, according to Epoch AI. The cluster targets 1 million GPUs by end of 2026, making it one of the largest AI training infrastructures ever built. Training runs for Grok-3 are already underway, though xAI has not disclosed performance metrics.

Key Takeaways

xAI's Colossus 2 hits 300,000 GPUs, targeting 1M by year-end.
Training Grok-3, the $6B cluster challenges OpenAI and Google.

Scale and Cost

Colossus 2 Becomes Fastest Growing AI Data Center In The World - CRE Daily

The build-out cost an estimated $6 billion for hardware alone, with cooling demands requiring 150 megawatts of power. This dwarfs competing clusters: Google's TPU v5p pods top out at 8,960 chips per unit, while Microsoft's planned 2027 cluster targets 100,000 GPUs. Colossus 2's density is achieved through direct liquid cooling and a custom InfiniBand fabric.

Competitive Implications

xAI is racing to close the compute gap with OpenAI and Google. The 300,000-GPU milestone gives it roughly 3x the raw FLOPS of OpenAI's current largest cluster, though efficiency depends on interconnect topology and model architecture. The unique take: xAI is betting that brute-force scaling still works for frontier models, even as competitors like Google push efficiency via sparse MoE and distillation. If Grok-3 matches GPT-5.5 on benchmarks like SWE-Bench or MATH, it will validate the scaling-only approach. If not, the $6B hardware spend will look like a bet on a fading paradigm.

Infrastructure Challenges

xAI Colossus supercomputer will soon run 500k GPUs - EONMSK News

Operating 300,000 GPUs requires solving reliability at unprecedented scale. Epoch AI notes that mean time between failures for GPU clusters drops non-linearly beyond 100,000 units. xAI has not published uptime data, but sources familiar with the build say redundancy is built into every rack with hot-swappable power supplies and redundant network paths. Memphis was chosen for its low energy costs and proximity to a TVA substation capable of 200 MW.

What to watch

Watch for Grok-3 benchmark scores on SWE-Bench and MATH by Q3 2026. If xAI publishes scaling laws data, it will reveal whether brute-force GPU stacking still beats algorithmic efficiency.

Source: news.google.com

Sources cited in this article

Epoch AI.

Source: gentic.news · 2d ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The Colossus 2 announcement is a bet that scaling laws haven't saturated. While Google and OpenAI have pivoted to sparse models, distillation, and inference-time compute, xAI is doubling down on raw GPU count. The 300,000-GPU milestone is impressive but masks the real challenge: keeping utilization high. Most large clusters see 60-70% utilization due to straggler nodes and network contention. If xAI has solved this with custom InfiniBand and liquid cooling, it could produce a model competitive with GPT-5.5. If not, the $6B is a monument to diminishing returns. The choice of Memphis over data center hubs like Northern Virginia or Oregon signals a focus on power availability over latency—this is a training cluster, not an inference one.

#ai infrastructure #xai #training

This story is part of

The AI Infrastructure War Shifts from Chips to Developer Tools

Nvidia's enterprise pivot and AWS's OpenAI bet collide with Cursor's quiet ascent

Compare side-by-side

OpenAI vs Google

→

Mentioned in this article

Colossus 2 xAI Grok-3 OpenAI Google Epoch AI

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Opinion & Analysis3 shared topics

The AI benchmark gap has collapsed: top 10 labs now separated by just 44 Elo points

Products & Launches3 shared topics

Google DeepMind loses its third senior AI researcher in months as Nobel laureate John Jumper joins Anthropic

Big Tech2 shared topics

Noam Shazeer leaves Google for OpenAI after $2.7B Character.AI return

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

Colossus 2: xAI's Memphis Cluster Hits 300,000 GPUs

Key Takeaways

Scale and Cost

Competitive Implications

Infrastructure Challenges

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

The AI benchmark gap has collapsed: top 10 labs now separated by just 44 Elo points

Google Inks Pentagon AI Deal, Reverses 2018 Stance

OpenAI-Broadcom Chip Hints at Token Price Collapse

Thinking Tokens Drive Hidden Inference Costs in Agentic Pipelines

Google DeepMind loses its third senior AI researcher in months as Nobel laureate John Jumper joins Anthropic

Noam Shazeer leaves Google for OpenAI after $2.7B Character.AI return

The framework underneath this story

More in AI Research

World Action Models Survey Unifies 100+ Methods Under One Taxonomy

OPID: Agents Learn From Hindsight Without External Memory

NVIDIA Drops Fast-FoundationStereo: 10× Faster Depth Estimation