Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A glowing AI chip with circuits and data streams, surrounded by smaller startup chips, symbolizing competition…

Inference shift opens door for AI chip startups to challenge Nvidia

Inference shift from training to serving creates opportunities for AI chip startups. Nvidia's $20B Groq acquihire validates disaggregated compute strategies.

AAAla AYADI & AI Research Desk·2h ago·3 min read··257 views·AI-Generated·Report error

Source: go.theregister.comvia the_register_data_centerCorroborated

How are AI chip startups gaining a second chance in the inference market against Nvidia?

AI chip startups like Cerebras, SambaNova, and Lumai are targeting inference workloads as the market shifts from training, leveraging disaggregated architectures that split prefill and decode tasks, with Nvidia's $20B Groq acquihire validating the approach.

TL;DR

Focus shifts from training to inference workloads. · Nvidia's Groq acquihire shows disaggregated compute strategy. · Startups target decode niche with SRAM and optical tech.

Nvidia's $20 billion Groq acquihire in December 2025 signaled that inference workloads are reshaping the AI chip market. For startups vying for a slice of Nvidia's pie, it's now or never.

Key facts

Nvidia acquired Groq for $20 billion in December 2025.
Lumai targets 1 exaOPS in 10kW power budget by 2029.
AWS uses Trainium for prefill, Cerebras for decode.
Intel partners with SambaNova for decode reference design.
Lumai runs Llama 3.1 8B and 70B models today.

AI adoption is reaching an inflection point as the focus shifts from training new models to serving them. Compared to training, inference is a much more diverse workload, presenting an opportunity for chip startups to carve out a niche. Large batch inference requires a different mix of compute, memory, and bandwidth than an AI assistant or code agent. [According to The Register]

Because of this, inference has become increasingly heterogeneous, with certain aspects better suited to GPUs and other specialized hardware. Nvidia's $20 billion acquihire of Groq in December is a prime example. Groq's SRAM-heavy chip architecture could churn out tokens faster than any GPU, but limited compute capacity and aging chip tech meant they couldn't scale efficiently. Nvidia side-stepped this by moving compute-heavy prefill to GPUs while keeping bandwidth-constrained decode on Groq's LPUs. [Per the source]

Disaggregated compute becomes the norm

This combination isn't unique to Nvidia. AWS announced a disaggregated compute platform using its Trainium accelerators for prefill and Cerebras Systems' wafer-scale accelerators for decode. Intel also announced a reference design using GPUs for prefill and SambaNova's new RDUs for decode. So far, most chip startups' wins have been on the decode side, where SRAM's speed advantage shines. [The Register reports]

Optical inference enters the fray

This week, UK-based startup Lumai detailed its optical inference accelerator, which uses light instead of electrons to perform matrix multiplication at a fraction of the power of digital architecture. Lumai expects its next-gen Iris Tetra systems to achieve an exaOPS of AI performance in a 10kW power budget by 2029. Initially, the chip is positioned as a standalone alternative to GPUs for compute-bound inference workloads like batch processing. Longer-term, the company plans to use its optical accelerators as prefill processors. The architecture currently runs billion-parameter models like Llama 3.1 8B or 70B. [According to the source]

Key Takeaways

Inference shift from training to serving creates opportunities for AI chip startups.
Nvidia's $20B Groq acquihire validates disaggregated compute strategies.

What to watch

Watch for Nvidia's next-generation Rubin architecture and whether it integrates disaggregated inference natively, potentially closing the window for startups. Also track Lumai's Iris Tetra tape-out timeline and customer adoption in 2027.

Sources cited in this article

The Register
Nvidia. AWS

Source: gentic.news · 2h ago · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The inference market is structurally different from training, and Nvidia's dominance faces a more credible challenge here. The disaggregated compute model—splitting prefill and decode—is a direct admission that no single architecture is optimal for all inference tasks. Nvidia's Groq acquihire was defensive: it bought the fastest token-generation engine to prevent a competitor from owning the decode niche. But AWS and Intel are now replicating the playbook with Cerebras and SambaNova, respectively. Lumai's optical approach is longer-term but addresses the power wall—10kW for an exaOPS is orders of magnitude more efficient than current GPUs. The key question is whether startups can scale manufacturing and software ecosystems before Nvidia integrates disaggregation into its own roadmap.

#chip startups #ai hardware #nvidia #inference

Compare side-by-side

Nvidia vs Groq

→

Mentioned in this article

Nvidia Groq Cerebras Systems Lumai Intel Trainium SambaNova Systems Llama 3 8B

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research2 shared topics

UALink 2.0 Spec Finalized, Aims to Challenge NVLink for AI Clusters

AI Research2 shared topics

Apple M5 Max NPU Benchmarks 2x Faster Than Intel Panther Lake NPU in Parakeet v3 AI Inference Test

Products & Launches2 shared topics

From Surveillance to Service: How Computer Vision is Redefining Luxury Retail Experiences

Products & Launches2 shared topics

SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU

Products & Launches2 shared topics

Jensen Huang Announces $20B Groq Integration, OpenClaw OS, and $50T+ Physical AI Market Vision on All-In Podcast

Products & Launches2 shared topics

Groq's LPU Inference Engine Demonstrates 500+ Token/s Performance on Llama 3.1 70B

More in Startups

View all

Startups

Former Li Auto Execs Launch Embodied AI Startup, Home Robot Due H1 2027

A new startup founded by former Li Auto executives is entering the embodied AI space, focusing on the home environment. Their first physical robot product is scheduled for release in the first half of 2027.

pandaily.com/Apr 8, 2026/3 min read/Widely Reported

chinahardwarerobotics

Startups

Zhipu AI and MiniMax Post 131.9% and 159% Revenue Growth in First Post-IPO Earnings

Zhipu AI and MiniMax, two leading Chinese AI startups, reported their first post-IPO financials, showing 131.9% and 159% year-on-year revenue growth respectively in 2025. This demonstrates initial commercial viability for their model-as-a-service and consumer app strategies, even as net losses continue to expand.

scmp.com/Apr 2, 2026/3 min read

financechinabusiness

Startups

Thai AI Startup Amity Raises $100M in Pre-IPO Round for Enterprise Generative AI Integration

Thai generative AI integration platform Amity has raised $100 million in a funding round to accelerate its product rollout and prepare for a stock-market debut. The move signals growing investor confidence in regional AI infrastructure plays beyond the US and China.

bloomberg.com/Mar 25, 2026/3 min read

fundingsoutheast asiagenerative ai

Disaggregated compute becomes the norm

Optical inference enters the fray

Key Takeaways

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

UALink 2.0 Spec Finalized, Aims to Challenge NVLink for AI Clusters

Apple M5 Max NPU Benchmarks 2x Faster Than Intel Panther Lake NPU in Parakeet v3 AI Inference Test

From Surveillance to Service: How Computer Vision is Redefining Luxury Retail Experiences

SemiAnalysis: NVIDIA's Customer Data Drives Disaggregated Inference, LPU Surpasses GPU

Jensen Huang Announces $20B Groq Integration, OpenClaw OS, and $50T+ Physical AI Market Vision on All-In Podcast

Groq's LPU Inference Engine Demonstrates 500+ Token/s Performance on Llama 3.1 70B

More in Startups

Former Li Auto Execs Launch Embodied AI Startup, Home Robot Due H1 2027

Zhipu AI and MiniMax Post 131.9% and 159% Revenue Growth in First Post-IPO Earnings

Thai AI Startup Amity Raises $100M in Pre-IPO Round for Enterprise Generative AI Integration