Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A glowing AI chip with circuits and data streams, surrounded by smaller startup chips, symbolizing competition…
StartupsScore: 80

Inference shift opens door for AI chip startups to challenge Nvidia

Inference shift from training to serving creates opportunities for AI chip startups. Nvidia's $20B Groq acquihire validates disaggregated compute strategies.

·2h ago·3 min read··257 views·AI-Generated·Report error
Share:
Source: go.theregister.comvia the_register_data_centerCorroborated
How are AI chip startups gaining a second chance in the inference market against Nvidia?

AI chip startups like Cerebras, SambaNova, and Lumai are targeting inference workloads as the market shifts from training, leveraging disaggregated architectures that split prefill and decode tasks, with Nvidia's $20B Groq acquihire validating the approach.

TL;DR

Focus shifts from training to inference workloads. · Nvidia's Groq acquihire shows disaggregated compute strategy. · Startups target decode niche with SRAM and optical tech.

Nvidia's $20 billion Groq acquihire in December 2025 signaled that inference workloads are reshaping the AI chip market. For startups vying for a slice of Nvidia's pie, it's now or never.

Key facts

  • Nvidia acquired Groq for $20 billion in December 2025.
  • Lumai targets 1 exaOPS in 10kW power budget by 2029.
  • AWS uses Trainium for prefill, Cerebras for decode.
  • Intel partners with SambaNova for decode reference design.
  • Lumai runs Llama 3.1 8B and 70B models today.

AI adoption is reaching an inflection point as the focus shifts from training new models to serving them. Compared to training, inference is a much more diverse workload, presenting an opportunity for chip startups to carve out a niche. Large batch inference requires a different mix of compute, memory, and bandwidth than an AI assistant or code agent. [According to The Register]

Because of this, inference has become increasingly heterogeneous, with certain aspects better suited to GPUs and other specialized hardware. Nvidia's $20 billion acquihire of Groq in December is a prime example. Groq's SRAM-heavy chip architecture could churn out tokens faster than any GPU, but limited compute capacity and aging chip tech meant they couldn't scale efficiently. Nvidia side-stepped this by moving compute-heavy prefill to GPUs while keeping bandwidth-constrained decode on Groq's LPUs. [Per the source]

Disaggregated compute becomes the norm

This combination isn't unique to Nvidia. AWS announced a disaggregated compute platform using its Trainium accelerators for prefill and Cerebras Systems' wafer-scale accelerators for decode. Intel also announced a reference design using GPUs for prefill and SambaNova's new RDUs for decode. So far, most chip startups' wins have been on the decode side, where SRAM's speed advantage shines. [The Register reports]

Optical inference enters the fray

This week, UK-based startup Lumai detailed its optical inference accelerator, which uses light instead of electrons to perform matrix multiplication at a fraction of the power of digital architecture. Lumai expects its next-gen Iris Tetra systems to achieve an exaOPS of AI performance in a 10kW power budget by 2029. Initially, the chip is positioned as a standalone alternative to GPUs for compute-bound inference workloads like batch processing. Longer-term, the company plans to use its optical accelerators as prefill processors. The architecture currently runs billion-parameter models like Llama 3.1 8B or 70B. [According to the source]

Key Takeaways

  • Inference shift from training to serving creates opportunities for AI chip startups.
  • Nvidia's $20B Groq acquihire validates disaggregated compute strategies.

What to watch

Watch for Nvidia's next-generation Rubin architecture and whether it integrates disaggregated inference natively, potentially closing the window for startups. Also track Lumai's Iris Tetra tape-out timeline and customer adoption in 2027.


Sources cited in this article

  1. Nvidia. AWS
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The inference market is structurally different from training, and Nvidia's dominance faces a more credible challenge here. The disaggregated compute model—splitting prefill and decode—is a direct admission that no single architecture is optimal for all inference tasks. Nvidia's Groq acquihire was defensive: it bought the fastest token-generation engine to prevent a competitor from owning the decode niche. But AWS and Intel are now replicating the playbook with Cerebras and SambaNova, respectively. Lumai's optical approach is longer-term but addresses the power wall—10kW for an exaOPS is orders of magnitude more efficient than current GPUs. The key question is whether startups can scale manufacturing and software ecosystems before Nvidia integrates disaggregation into its own roadmap.
Compare side-by-side
Nvidia vs Groq
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

More in Startups

View all