OpenAI and Broadcom are co-developing a custom AI chip focused on inference workloads. The chip, hinted at by @mweinbach, could dramatically lower token prices for end users.
Key facts
- OpenAI and Broadcom co-developing custom AI inference chip
- Chip targets inference, not training workloads
- Potential 3-5x performance per watt vs. Nvidia H100
- Token prices could drop to $0.001-0.005 per 1K tokens
- No tape-out or production timeline disclosed
OpenAI and Broadcom are co-developing a custom AI chip focused on inference workloads, according to a tweet from @mweinbach. The unnamed chip targets large-scale deployment, not training, and could cut per-token costs by an order of magnitude.
This is a strategic play to reduce dependency on Nvidia's H100/B200 supply chain. OpenAI has been vocal about the need for cheaper inference to make AI applications economically viable at scale. The Broadcom partnership leverages Broadcom's ASIC design expertise, similar to Google's TPU collaboration with Broadcom.
The chip's design likely optimizes for low-latency, high-throughput matrix multiplication and attention mechanisms, the dominant operations in transformer inference. By co-designing the silicon with its own model architectures, OpenAI can potentially achieve 3-5x better performance per watt compared to general-purpose GPUs.
Token price reduction is the headline: inference costs currently range from $0.01 to $0.10 per 1K tokens for GPT-4-class models. A custom chip could bring that down to $0.001-$0.005 per 1K tokens, making real-time applications like voice assistants and code generation more accessible.
However, the chip is still in development. No tape-out date or production timeline has been disclosed. Broadcom's typical ASIC cycle is 18-24 months from design to volume shipments.
The Competitive Landscape
OpenAI isn't alone in this move. Google has its TPU series, Amazon has Trainium and Inferentia, and Microsoft is rumored to be developing its own AI chip. The difference: OpenAI's chip is explicitly for inference, not training, reflecting the company's bet that inference demand will outpace training demand as models mature.
Risks and Caveats
Custom ASICs carry significant risk. Design errors can delay projects by months. Software stack maturity is critical; OpenAI will need to build or adapt its runtime (Triton, CUDA alternatives) to the new hardware. And Broadcom's ASIC business has faced supply chain constraints in previous quarters.
Despite these risks, the signal is clear: OpenAI sees inference cost as the bottleneck to scaling its business. A custom chip is the most direct path to solving it.
Key Takeaways
- OpenAI and Broadcom are co-developing a custom AI inference chip that could cut token prices by an order of magnitude, per @mweinbach.
- The chip targets inference workloads, not training, and aims to reduce dependency on Nvidia.
What to watch
Watch for OpenAI to disclose tape-out or production milestones in its quarterly earnings calls or blog posts. A public benchmark showing token price per 1K tokens on the new chip versus GPT-4o on H100 would be the first concrete signal of cost reduction. Also watch for Nvidia's response: a custom inference SKU or price cut on the B200.









