Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Google Cloud Next stage in Las Vegas with a presenter gesturing toward a large screen showing a TPU chip diagram and…

Google TPU 'Broadfly' Topology Scales Pod to 1,152 Chips

Google unveiled a Broadfly TPU topology at Cloud Next, scaling pods to 1,152 chips — 4.5x larger than Ironwood — with max 7 hops. This inference-first design challenges NVIDIA's NVLink on scale and latency.

AAAla SMITH & AI Research Desk·4h ago·3 min read··4 views·AI-Generated·Report error

Source: x.comvia @SemiAnalysis_Single Source

What is Google's new Broadfly network topology and how does it scale TPU pods?

Google's new inference TPU uses a Broadfly network topology, scaling a single pod to 1,152 chips — 4.5x larger than Ironwood — with at most 7 hops between any two TPUs, per @SemiAnalysis_.

TL;DR

Broadfly topology enables 1,152-TPU pod. · 4.5x larger pod than Ironwood. · Max 7 hops between any two chips.

Google unveiled a new inference TPU with a Broadfly network topology at Cloud Next in Las Vegas. The design scales a single pod to 1,152 chips — 4.5x larger than the previous Ironwood generation.

Key facts

1,152 TPUs per pod with Broadfly topology.
4.5x larger pod than Ironwood generation.
Maximum 7 hops between any two TPUs.
Unveiled at Google Cloud Next in Las Vegas.
Focus is inference, not training.

At Google Cloud Next in Las Vegas, Google detailed a new inference-focused TPU that abandons traditional torus or fat-tree interconnects. The chip uses a novel topology called "Broadfly," first described by @SemiAnalysis_.

How Broadfly works

Broadfly is a high-radix network design that packs more direct connections per TPU, reducing the number of hops data must traverse. In a 1,152-chip pod, the maximum hop count between any two TPUs is just 7 — a stark contrast to Ironwood's larger network diameter. This tighter coupling cuts inference latency for models that require cross-chip communication, such as mixture-of-experts (MoE) architectures or large-scale transformer serving.

Why this matters

The 4.5x pod-size increase over Ironwood is not merely a density play. By keeping hop count low at scale, Google can serve models with larger memory footprints — think 1T+ parameter MoEs — without hitting the communication bottlenecks that plague ring-based topologies. The unique take: Broadfly effectively inverts the traditional trade-off between scale and latency. Most interconnects force a choice between big pods (high diameter) or low latency (small pods). Google's design claims both.

Competitive context

NVIDIA's NVLink-based DGX systems top out at 576 GPUs per domain (Hopper generation), with a maximum of 8 hops in a 576-GPU Dragonfly+ topology. Broadfly's 1,152-chip pod with 7 hops doubles the scale while maintaining comparable or better diameter. Google did not disclose the per-chip bandwidth or whether the TPU uses optical interconnects; those details will likely surface in a paper or at a future conference.

Inference-first design

The focus on inference — not training — signals Google's intent to capture the growing model-serving market. As inference workloads shift toward larger batch sizes and longer context windows (e.g., 1M-token Claude or Gemini), network topology becomes a first-order latency factor. Broadfly's high-radix, low-diameter design is purpose-built for this regime.

What to watch

Watch for Google to publish a detailed paper on Broadfly's routing algorithm and per-chip bisection bandwidth at ISCA or SC26. Also track whether Google Cloud offers the 1,152-chip pod as a single reservation unit — that would signal enterprise inference demand.

Sources cited in this article

Source: gentic.news · 4h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Broadfly represents a structural bet that inference, not training, will drive the next wave of AI infrastructure spend. The 4.5x pod-size gain over Ironwood is impressive on paper, but the real test is whether software can exploit the topology. Many high-radix designs (e.g., Dragonfly) require careful traffic engineering to avoid congestion; Broadfly's routing algorithm is the unspoken differentiator. What's missing: per-chip bandwidth numbers, power draw, and whether the TPU uses optical interconnects. Without those, we can't fully evaluate the cost-per-inference vs. NVIDIA's Blackwell or AMD's MI400. The 7-hop claim is also a best-case — real-world latency in a loaded pod depends on load-balancing. Still, the contrarian take is that Google may be over-engineering for a use case that doesn't yet exist. Most inference today fits on 8 GPUs; the long-tail of giant MoE models (1T+ params) is still small. Broadfly is a bet that the tail will become the body.

#networking #hardware #google

Compare side-by-side

Google vs Nvidia

→

Mentioned in this article

Google TPU Broadfly Nvidia Google Cloud Next

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches2 shared topics

Meta Deploys Millions of Amazon Graviton CPUs for AI Agents

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in AI Research

View all

Diagram of Hermes agent's three-tier memory architecture with MEMORY.md and USER.md files as tier 1 core…

AI Research

Hermes Agent's Three-Tier Memory Cuts Context Bloat, Keeps 2,200-Char Core

Hermes agent's three-tier memory uses two tiny markdown files (2,200 chars), SQLite FTS5 search (10ms over 10K docs), and 8 pluggable providers. The composition solves the always-on vs. deep recall trade-off.

x.com/11h ago/3 min read

open sourceai agentsmemory systems