Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Two engineers in a cleanroom lab inspecting a silicon wafer while a glowing AI chip schematic is displayed on a…

OpenAI-Broadcom Chip Hints at Token Price Collapse

OpenAI and Broadcom are co-developing a custom AI inference chip that could cut token prices by an order of magnitude, per @mweinbach. The chip targets inference workloads, not training, and aims to reduce dependency on Nvidia.

·5h ago·3 min read··12 views·AI-Generated·Report error
Share:
What is the OpenAI-Broadcom chip and how could it affect token prices?

OpenAI is co-developing a custom AI chip with Broadcom, per @mweinbach, aimed at reducing inference token prices. The chip targets large-scale deployment, not training, and could cut per-token costs by an order of magnitude.

TL;DR

OpenAI and Broadcom co-developing custom AI chip · Focus on inference, not training · Could significantly lower token costs

OpenAI and Broadcom are co-developing a custom AI chip focused on inference workloads. The chip, hinted at by @mweinbach, could dramatically lower token prices for end users.

Key facts

  • OpenAI and Broadcom co-developing custom AI inference chip
  • Chip targets inference, not training workloads
  • Potential 3-5x performance per watt vs. Nvidia H100
  • Token prices could drop to $0.001-0.005 per 1K tokens
  • No tape-out or production timeline disclosed

OpenAI and Broadcom are co-developing a custom AI chip focused on inference workloads, according to a tweet from @mweinbach. The unnamed chip targets large-scale deployment, not training, and could cut per-token costs by an order of magnitude.

This is a strategic play to reduce dependency on Nvidia's H100/B200 supply chain. OpenAI has been vocal about the need for cheaper inference to make AI applications economically viable at scale. The Broadcom partnership leverages Broadcom's ASIC design expertise, similar to Google's TPU collaboration with Broadcom.

The chip's design likely optimizes for low-latency, high-throughput matrix multiplication and attention mechanisms, the dominant operations in transformer inference. By co-designing the silicon with its own model architectures, OpenAI can potentially achieve 3-5x better performance per watt compared to general-purpose GPUs.

Token price reduction is the headline: inference costs currently range from $0.01 to $0.10 per 1K tokens for GPT-4-class models. A custom chip could bring that down to $0.001-$0.005 per 1K tokens, making real-time applications like voice assistants and code generation more accessible.

However, the chip is still in development. No tape-out date or production timeline has been disclosed. Broadcom's typical ASIC cycle is 18-24 months from design to volume shipments.

The Competitive Landscape

OpenAI isn't alone in this move. Google has its TPU series, Amazon has Trainium and Inferentia, and Microsoft is rumored to be developing its own AI chip. The difference: OpenAI's chip is explicitly for inference, not training, reflecting the company's bet that inference demand will outpace training demand as models mature.

Risks and Caveats

Custom ASICs carry significant risk. Design errors can delay projects by months. Software stack maturity is critical; OpenAI will need to build or adapt its runtime (Triton, CUDA alternatives) to the new hardware. And Broadcom's ASIC business has faced supply chain constraints in previous quarters.

Despite these risks, the signal is clear: OpenAI sees inference cost as the bottleneck to scaling its business. A custom chip is the most direct path to solving it.

According to @mweinbach

Key Takeaways

  • OpenAI and Broadcom are co-developing a custom AI inference chip that could cut token prices by an order of magnitude, per @mweinbach.
  • The chip targets inference workloads, not training, and aims to reduce dependency on Nvidia.

What to watch

Watch for OpenAI to disclose tape-out or production milestones in its quarterly earnings calls or blog posts. A public benchmark showing token price per 1K tokens on the new chip versus GPT-4o on H100 would be the first concrete signal of cost reduction. Also watch for Nvidia's response: a custom inference SKU or price cut on the B200.

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The tweet from @mweinbach is thin on details, but the strategic direction is clear. OpenAI's move mirrors Google's TPU playbook: vertical integration to control cost and supply. The key difference is focus on inference, not training. This suggests OpenAI expects inference demand to dominate as models commoditize. The Broadcom partnership is a smart hedge against Nvidia's pricing power. However, the 18-24 month development cycle means this is a medium-term play, not a near-term fix for current token prices. The lack of technical details (architecture, node, memory bandwidth) means the performance claims are speculative. A 3-5x efficiency gain over H100 is plausible given ASIC advantages but unproven at this scale. The real test will be software stack maturity: if OpenAI can make the chip easy to program for its model families, it could shift the inference market significantly.
This story is part of
Claude Code's Campus Conquest Flips Anthropic's Talent Pipeline, Leaving Google's Academic Edge in Doubt
Viral adoption at MIT and Stanford transforms Claude Code from product into recruiting funnel, threatening Google's long-held research talent dominance
Compare side-by-side
OpenAI vs Nvidia
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Products & Launches

View all