Skip to content
gentic.news — AI News Intelligence Platform

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Google Splits TPU Line: 8t for Training, 8i for Inference

Google Splits TPU Line: 8t for Training, 8i for Inference

At Cloud Next 2026, Google introduced two new AI chips — TPU 8t for training and TPU 8i for inference — splitting its custom silicon for the first time. OpenAI, Anthropic, and Meta are buying multi-gigawatt TPU capacity, signaling a crack in NVIDIA's 81% market share.

Share:

Key Takeaways

  • At Cloud Next 2026, Google introduced two new AI chips — TPU 8t for training and TPU 8i for inference — splitting its custom silicon for the first time.
  • OpenAI, Anthropic, and Meta are buying multi-gigawatt TPU capacity, signaling a crack in NVIDIA's 81% market share.

What Happened

Google Cloud splits eighth-generation TPUs into TPU 8t for ...

Google just broke a decade-long tradition. At Cloud Next 2026, the company unveiled not one, but two new AI chips: the TPU 8t for training and TPU 8i for inference. For the first time ever, Google is splitting its custom silicon into specialized architectures instead of relying on a one-size-fits-all design.

Technical Details

TPU 8t Superpod

  • Scale: 9,600 liquid-cooled chips per superpod
  • Compute: 121 FP4 ExaFlops peak, roughly a 3x leap over the previous generation
  • Design partner: Broadcom co-designed the TPU 8t
  • Fabrication: TSMC

TPU 8i Inference Chip

  • Performance-per-dollar: 80% better than its predecessor
  • Memory: Triple the on-chip memory
  • Network: New Boardfly topology that cuts network latency in half
  • Design partner: MediaTek handles the TPU 8i
  • Fabrication: TSMC

The Bigger Picture

The most significant aspect: Anthropic, Meta, and now OpenAI are buying multi-gigawatt allocations of TPU capacity. OpenAI booking Google silicon is a first visible crack in NVIDIA's grip on frontier AI training.

NVIDIA still holds 81% of the AI chip market, but the era of serious competition has officially begun.

How It Compares

GPU vs TPU: Understanding the Differences in AI Training and Inference ...

Peak compute 121 FP4 ExaFlops — ~40 FP4 ExaFlops (est.) Performance/$ improvement — 80% better Baseline On-chip memory — 3x Baseline Network latency — Halved Baseline Specialization Training Inference Unified

What This Means in Practice

For AI teams training large models, the TPU 8t offers a 3x compute jump in a single superpod, potentially reducing training time from months to weeks. For inference workloads, TPU 8i's 80% better performance-per-dollar could meaningfully lower serving costs for high-throughput applications like chatbots and code assistants.

gentic.news Analysis

This move by Google is a strategic pivot that has been building for years. We previously covered Google's TPU v5p launch in late 2023, which focused on unified design. The split into specialized training and inference chips mirrors a trend we've seen across the industry: NVIDIA's H100 and B200 also have distinct training/inference profiles, but Google is taking it further with entirely separate architectures.

The fact that OpenAI — historically an NVIDIA customer — is now buying TPU capacity is the real headline. We reported on OpenAI's $100 billion Stargate data center project last year, which was rumored to be NVIDIA-heavy. This TPU deal suggests OpenAI is hedging its supply chain, a wise move given NVIDIA's 81% market share and the ongoing GPU shortage.

Anthropic and Meta already had relationships with Google Cloud, but Meta's involvement is notable given its parallel work on custom MTIA chips. This suggests even companies building their own silicon still need external capacity.

The Broadcom and MediaTek partnerships are also telling. Google is not vertically integrating chip design entirely — it's leveraging external expertise while owning the architecture. This is a playbook Amazon also uses with Annapurna Labs and its Trainium/Inferentia chips.

Frequently Asked Questions

What is the difference between TPU 8t and TPU 8i?

The TPU 8t is designed for training large AI models, with a 9,600-chip superpod delivering 121 FP4 ExaFlops. The TPU 8i is specialized for inference — running trained models — with 80% better performance-per-dollar and triple the on-chip memory of its predecessor.

Why is OpenAI buying Google TPUs?

OpenAI is diversifying its hardware supply chain away from exclusive reliance on NVIDIA GPUs, which hold 81% of the AI chip market. Booking multi-gigawatt TPU capacity gives OpenAI more leverage in pricing and availability negotiations, and guards against potential GPU shortages.

How do the new TPUs compare to NVIDIA's chips?

NVIDIA's H100 and B200 remain the market leaders, but the TPU 8t's 121 FP4 ExaFlops per superpod represents a roughly 3x compute leap over the previous TPU generation. The TPU 8i's 80% better performance-per-dollar could undercut NVIDIA on inference costs, though real-world benchmarks will be needed to confirm.

When will the TPU 8t and 8i be available?

Google announced the chips at Cloud Next 2026. Availability timelines were not disclosed in the source, but historically, Google rolls out new TPUs to cloud customers within 6-12 months of announcement.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The split into training and inference chips is a natural evolution. As AI models grow larger and inference becomes the dominant cost, specialized hardware for each phase makes engineering sense. The TPU 8i's triple on-chip memory is particularly important — it allows larger models to be served without relying on slower off-chip memory, directly addressing the memory bottleneck that plagues LLM serving. The Boardfly topology halving network latency is a sleeper feature. For distributed inference across many chips, network latency is often the bottleneck. Halving it could enable more efficient model parallelism and reduce tail latency in production. The 3x compute jump in the TPU 8t is impressive but comes with caveats. FP4 ExaFlops are a peak theoretical metric — real-world throughput depends on model architecture, batch size, and numerical stability. Google will need to demonstrate that FP4 training delivers comparable accuracy to FP8 or FP16, a challenge that NVIDIA and AMD are also tackling. OpenAI's adoption is the strongest signal yet that the AI hardware market is diversifying. If frontier labs are willing to move away from NVIDIA's CUDA ecosystem, it validates that software moats can be overcome with sufficient investment in compiler stacks and framework adaptations.
Enjoyed this article?
Share:

Related Articles

More in Products & Launches

View all