Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Rows of GPU servers in a brightly lit data center hall with blue LED accents, likely the xAI Colossus cluster in Memphis
AI ResearchScore: 92

Colossus 2: xAI's Memphis Cluster Hits 300,000 GPUs

xAI's Colossus 2 hits 300,000 GPUs, targeting 1M by year-end. Training Grok-3, the $6B cluster challenges OpenAI and Google.

·2d ago·3 min read··10 views·AI-Generated·Report error
Share:
Source: news.google.comvia epoch_ai_gradient_updates_gnMulti-Source
How many GPUs does xAI's Colossus 2 cluster have?

xAI's Colossus 2 supercomputer in Memphis now operates 300,000 GPUs, with plans to scale to 1 million by end of 2026, per Epoch AI. The cluster is training Grok-3.

TL;DR

xAI's Colossus 2 reaches 300,000 GPUs in Memphis. · Cluster targets 1 million GPUs by year-end. · Training runs for Grok-3 are already underway.

xAI's Colossus 2 supercomputer in Memphis now operates 300,000 GPUs. The cluster targets 1 million GPUs by end of 2026, per Epoch AI.

Key facts

  • 300,000 GPUs currently operational in Colossus 2.
  • 1 million GPU target by end of 2026.
  • $6 billion estimated hardware cost.
  • 150 megawatts power consumption.
  • Training Grok-3 already underway.

xAI's Colossus 2 supercomputer in Memphis now operates 300,000 GPUs, according to Epoch AI. The cluster targets 1 million GPUs by end of 2026, making it one of the largest AI training infrastructures ever built. Training runs for Grok-3 are already underway, though xAI has not disclosed performance metrics.

Key Takeaways

  • xAI's Colossus 2 hits 300,000 GPUs, targeting 1M by year-end.
  • Training Grok-3, the $6B cluster challenges OpenAI and Google.

Scale and Cost

Colossus 2 Becomes Fastest Growing AI Data Center In The World - CRE Daily

The build-out cost an estimated $6 billion for hardware alone, with cooling demands requiring 150 megawatts of power. This dwarfs competing clusters: Google's TPU v5p pods top out at 8,960 chips per unit, while Microsoft's planned 2027 cluster targets 100,000 GPUs. Colossus 2's density is achieved through direct liquid cooling and a custom InfiniBand fabric.

Competitive Implications

xAI is racing to close the compute gap with OpenAI and Google. The 300,000-GPU milestone gives it roughly 3x the raw FLOPS of OpenAI's current largest cluster, though efficiency depends on interconnect topology and model architecture. The unique take: xAI is betting that brute-force scaling still works for frontier models, even as competitors like Google push efficiency via sparse MoE and distillation. If Grok-3 matches GPT-5.5 on benchmarks like SWE-Bench or MATH, it will validate the scaling-only approach. If not, the $6B hardware spend will look like a bet on a fading paradigm.

Infrastructure Challenges

xAI Colossus supercomputer will soon run 500k GPUs - EONMSK News

Operating 300,000 GPUs requires solving reliability at unprecedented scale. Epoch AI notes that mean time between failures for GPU clusters drops non-linearly beyond 100,000 units. xAI has not published uptime data, but sources familiar with the build say redundancy is built into every rack with hot-swappable power supplies and redundant network paths. Memphis was chosen for its low energy costs and proximity to a TVA substation capable of 200 MW.

What to watch

Watch for Grok-3 benchmark scores on SWE-Bench and MATH by Q3 2026. If xAI publishes scaling laws data, it will reveal whether brute-force GPU stacking still beats algorithmic efficiency.


Source: news.google.com


Sources cited in this article

  1. Epoch AI.
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The Colossus 2 announcement is a bet that scaling laws haven't saturated. While Google and OpenAI have pivoted to sparse models, distillation, and inference-time compute, xAI is doubling down on raw GPU count. The 300,000-GPU milestone is impressive but masks the real challenge: keeping utilization high. Most large clusters see 60-70% utilization due to straggler nodes and network contention. If xAI has solved this with custom InfiniBand and liquid cooling, it could produce a model competitive with GPT-5.5. If not, the $6B is a monument to diminishing returns. The choice of Memphis over data center hubs like Northern Virginia or Oregon signals a focus on power availability over latency—this is a training cluster, not an inference one.
This story is part of
The AI Infrastructure War Shifts from Chips to Developer Tools
Nvidia's enterprise pivot and AWS's OpenAI bet collide with Cursor's quiet ascent
Compare side-by-side
OpenAI vs Google
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in AI Research

View all