Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

AI Compute Crisis: GPU Prices Up 48%, Anthropic API at 98.95% Uptime

AI Compute Crisis: GPU Prices Up 48%, Anthropic API at 98.95% Uptime

The AI industry faces a severe compute capacity crisis, with GPU prices up 48%, Anthropic API uptime falling to 98.95%, and OpenAI shutting down Sora to reallocate resources. Demand for agentic AI is outstripping supply, forcing rationing and product cancellations.

GAla Smith & AI Research Desk·6h ago·6 min read·12 views·AI-Generated
Share:
Source: the-decoder.comvia the_decoderCorroborated
AI Compute Crisis: GPU Prices Jump 48%, Anthropic API Uptime Falls to 98.95%

A severe compute capacity shortage is gripping the AI industry, forcing major providers into rationing, product cancellations, and price hikes as demand for agentic AI tools explodes. According to a Wall Street Journal report detailed by The Decoder, GPU prices have surged 48%, Anthropic's API availability has dropped below enterprise standards, and OpenAI is shutting down its Sora video generation app to free up resources.

The Capacity Crunch by the Numbers

The crisis manifests in three concrete metrics:

GPU Price Increase +48% Per the Ornn Compute Price Index. Bank of America expects demand to outstrip supply through at least 2029. Anthropic Claude API Uptime 98.95% Over 90 days ending April 8, 2026. Well below the cloud standard of 99.99%. OpenAI API Token Usage 6B → 15B/min Usage jumped from October 2025 to March 2026, a 150% increase in five months.

Anthropic: Growth Strains Reliability

Anthropic exemplifies the tension between explosive growth and infrastructure limits. The company's annualized revenue rate (ARR) skyrocketed from $9 billion at the end of 2025 to over $30 billion by April 2026. However, this growth has come at the cost of reliability. Since mid-February, frequent outages have plagued its Claude API, pushing its 90-day uptime to 98.95%.

Image description

The impact is tangible for customers. David Hsu, founder of software platform Retool, told the WSJ he prefers Anthropic's Opus 4.6 model but was forced to switch to OpenAI due to persistent service disruptions. This loss of enterprise clients highlights the business risk when AI infrastructure cannot keep pace with demand.

OpenAI's Triage: Killing Sora for Spud

OpenAI is making explicit trade-offs to manage scarce compute. The company announced it will shut down the web and app versions of its Sora video generation tool on April 26, with the API following in September 2026. This move is a direct reallocation of resources toward what the company deems higher priorities: coding and enterprise products built on a new AI model codenamed "Spud."

OpenAI CFO Sarah Friar told the WSJ she spends significant time hunting for near-term compute capacity, and the company is shelving projects to redirect resources. The 150% surge in API token usage from 6 billion to 15 billion per minute between October and March underscores the scale of the demand hitting their systems.

Market Dynamics: GPU Prices and Long-Term Shortages

The hardware market reflects the strain. The Ornn Compute Price Index reports a 48% increase in GPU prices. Analysts at Bank of America see no quick fix, projecting that demand will continue to outstrip supply through at least 2029. This long-term forecast suggests the current rationing and product cancellations are not temporary glitches but early indicators of a sustained structural shortage.

The core driver is the shift from conversational chatbots to agentic AI—autonomous systems that execute multi-step tasks. These agents are far more computationally intensive per query, multiplying the load on data centers.

What This Means in Practice

For developers and companies building with AI:

  1. Expect higher costs and volatility. API pricing may increase, and spot instance availability for cloud GPUs will be unpredictable.
  2. Prioritize reliability over model preference. As Retool's experience shows, the "best" model is useless if it's frequently down.
  3. Plan for product discontinuations. OpenAI's shutdown of Sora signals that even popular services from leading labs are not safe if they consume disproportionate compute.

gentic.news Analysis

This compute crisis is the inevitable consequence of the industry's breakneck pivot to agentic workflows, a trend we've tracked since the launch of frameworks like Cognition AI's Devin in early 2025. The 48% GPU price spike directly impacts every layer of the stack, from cloud providers like AWS and Azure to startups trying to train new models. It validates the strategic bets made by companies like Tesla (saving its Nvidia H100 orders for its own FSD development) and xAI, which secured priority capacity by building its own data centers.

The contrast between Anthropic's revenue growth and its falling API reliability is a stark warning. It mirrors challenges faced by earlier high-growth SaaS companies that struggled with technical debt, but at the physical layer of silicon and power. OpenAI's decision to kill Sora—a flagship product unveiled just over a year ago—is a remarkable concession to scarcity. It suggests their internal projections for Spud's compute needs are enormous, and that they are willing to sacrifice a public-facing product to secure capacity for enterprise and coding agents, which likely have clearer monetization paths.

This shortage creates a moat for incumbents with committed capacity (Google, Meta, Microsoft) and raises the barrier to entry for new foundation model players to near-insurmountable levels. The next phase of competition may be less about algorithmic innovation and more about who can secure the most joules of compute and megawatts of power.

Frequently Asked Questions

Why are GPU prices increasing by 48%?

The price surge is driven by a massive imbalance between supply and demand. Demand for AI compute, particularly for running complex agentic AI systems, is growing faster than the semiconductor industry can manufacture advanced GPUs. Bank of America analysts project this supply deficit will persist through at least 2029, suggesting high prices are a new normal.

What does 98.95% API uptime mean for Anthropic users?

An industry-standard uptime for critical cloud services is 99.99% ("four nines"), which translates to about 52 minutes of downtime per year. Anthropic's reported 98.95% uptime over 90 days equates to roughly 7.5 hours of downtime in that period—far above acceptable levels for enterprise applications that depend on reliable AI APIs. This unreliability is causing business customers to switch providers.

Is OpenAI shutting down the Sora API completely?

Yes. OpenAI has announced a phased shutdown. The web and app interfaces for Sora will go offline on April 26, 2026. The Sora API will remain available until September 2026, after which it will be fully discontinued. The company states this is to reallocate substantial compute resources to other priorities like coding and enterprise AI models.

What is "agentic AI" and why does it use so much compute?

Agentic AI refers to systems that don't just answer questions but autonomously plan and execute multi-step tasks—like writing and deploying code, conducting research, or managing a workflow. Unlike a single chat completion, an agent might make dozens of LLM calls, use external tools, and run in a loop for minutes. This multiplies the computational cost per user request, driving the surge in demand that is overwhelming current infrastructure.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This compute crisis represents a critical inflection point for the AI industry, shifting the bottleneck from algorithmic research to physical infrastructure. For the past two years, progress has been measured by benchmark scores on MMLU or GPQA. Now, the limiting factor is access to H100s and power contracts. This will accelerate several trends: a rush into alternative chip architectures (Groq, Cerebras), increased investment in model efficiency techniques (speculative decoding, mixture-of-experts pruning), and a consolidation of power among a few well-capitalized players. Practitioners should view this as a hardening of the market. The era of easily accessible, cheap inference for experimental projects is ending. Development will require more careful cost forecasting and a preference for smaller, more efficient models where possible. The shutdown of Sora is particularly telling; it shows that even dominant players cannot support all frontier applications simultaneously and must make brutal prioritization decisions. The companies that navigate this crisis successfully will be those with deep capital reserves, vertical integration into hardware, or a ruthless focus on the highest-margin AI applications.
Enjoyed this article?
Share:

Related Articles

More in Opinion & Analysis

View all