Ethan Mollick, a professor at Wharton and prominent AI commentator, has succinctly framed a critical bottleneck facing the artificial intelligence industry: a compute double bind that threatens both current operations and future development.
What Happened
In a recent post, Mollick outlined the two-pronged problem created by severe compute constraints—the shortage of specialized AI chips and data center capacity needed to run and train large language models.
On the inference side (running existing models for users), companies face three unpalatable choices:
- Raise prices for API calls and services
- Ration use through rate limits or access tiers
- Serve worse models that require less computational power
All three options hurt current growth by making AI services more expensive, less accessible, or lower quality.
On the training side, the constraint is even more fundamental: companies cannot train the next generation of models at the scale needed to stay competitive. This directly hurts future growth by slowing the pace of capability improvements.
Context: The Compute Crunch
This analysis comes amid what industry observers have called "the great GPU drought." The demand for Nvidia's H100 and Blackwell architecture chips has far outstripped supply, creating waiting lists measured in quarters for major cloud providers and AI labs.
The constraint isn't just about chips—it's about the entire infrastructure: power availability for data centers, cooling systems, networking equipment, and the physical space to house these increasingly massive compute clusters.
The Business Impact
For AI companies, this creates what economists call a Catch-22:
- If they prioritize current users with better inference, they sacrifice future competitiveness
- If they prioritize future training, they degrade current user experience and revenue
- There's no easy middle ground when compute is fundamentally scarce
This dynamic explains several observable industry trends:
- API price increases from major providers throughout 2025
- Increased rate limiting and usage caps even for enterprise customers
- Delayed model releases as training schedules slip
- Focus on inference optimization techniques like speculative decoding and quantization
gentic.news Analysis
Mollick's double-bind framework perfectly captures the strategic dilemma facing every major AI player in 2026. This isn't a temporary supply chain issue—it's becoming a structural constraint on the industry's growth trajectory.
The inference problem is already visible in the market. As we covered in our analysis of Anthropic's Claude 3.5 Sonnet pricing changes, companies are passing costs to customers. OpenAI's gradual rollout of o1-series models and Google's careful management of Gemini Ultra access both reflect rationing strategies. The "serve worse models" option manifests as providers defaulting users to smaller, less capable models during peak loads—a degradation of service quality that directly impacts developer trust.
The training constraint is more insidious but potentially more damaging. Our tracking of model release timelines shows clear slippage: GPT-5 arrived later than expected, Gemini 2.0's training was reportedly paused multiple times, and several open-source efforts have scaled back their ambitions. The rumored 100-trillion-parameter models that were supposed to arrive in 2026 now look like 2027-2028 prospects at best.
This creates unusual competitive dynamics. Normally, capital-rich companies like Google, Microsoft, and Meta would have an insurmountable advantage. But when even they can't get enough GPUs, smaller players with unique architectures or training methods might find openings. We're seeing increased investment in alternative compute (optical processors, neuromorphic chips) and radical efficiency improvements (like JEPA-based architectures that require less training data). The company that cracks the efficiency code—achieving GPT-4 level performance with 10x less compute—could leapfrog the current leaders.
The most likely near-term outcome is a bifurcated market: premium, compute-intensive AI for those who can pay, and lightweight, optimized models for everyone else. This could slow the democratization of AI capabilities that many predicted just two years ago.
Frequently Asked Questions
What are compute constraints in AI?
Compute constraints refer to the limited availability of specialized hardware (primarily GPUs like Nvidia's H100), data center capacity, and electrical power needed to run (inference) and train large AI models. This scarcity has created bottlenecks affecting every major AI company.
How are AI companies responding to compute shortages?
Companies are employing three main strategies: raising prices for API access, implementing usage rationing through rate limits, and sometimes serving less capable models during peak demand. For training, many are delaying next-generation models, optimizing existing architectures for efficiency, and investing in alternative hardware solutions.
Will the compute shortage get better or worse?
Most analysts expect constraints to persist through at least 2027. While chip manufacturers are ramping production, demand continues to grow faster. New fabrication plants take years to build, and AI model sizes are increasing exponentially. Some relief may come from architectural improvements that make models more efficient rather than just bigger.
How does this affect AI startups versus large tech companies?
Large companies with long-term chip contracts and their own data centers have an advantage but still face constraints. Startups without guaranteed GPU access struggle more severely, often waiting months for cloud capacity. This dynamic could consolidate power among established players unless alternative compute architectures or efficiency breakthroughs level the playing field.








