Skip to content
gentic.news — AI News Intelligence Platform

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

AWS Never Retired an A100 Server, CEO Says Amid Chip Shortage

AWS Never Retired an A100 Server, CEO Says Amid Chip Shortage

AWS CEO Matt Garman stated that A100 servers are completely sold out and never retired, as demand for older chips outpaces supply. This underscores the prolonged GPU shortage and the value of legacy hardware in cloud AI.

Share:

Key Takeaways

  • AWS CEO Matt Garman stated that A100 servers are completely sold out and never retired, as demand for older chips outpaces supply.
  • This underscores the prolonged GPU shortage and the value of legacy hardware in cloud AI.

What Happened

설명 가능한 인공지능(Explainable AI)이란? - NVIDIA Technical Blog

AWS CEO Matt Garman revealed that the company has never retired an A100 server and is currently completely sold out of them, citing persistent demand outstripping supply. Speaking at a recent event, Garman noted, "Because there is so much more demand than supply, there typically still is demand for the older chips, actually. And today, we actually are completely sold out of and have never retired an A100 server, as an example."

This statement underscores the ongoing GPU shortage that has plagued the AI industry since the surge in large language model training and inference workloads. The A100, based on NVIDIA's Ampere architecture, was released in 2020 and is now considered a previous-generation chip compared to the newer H100 (Hopper) and B100 (Blackwell) families.

Context

The A100 was NVIDIA's flagship data center GPU before the H100 launched in 2022. Despite being superseded, its availability and compatibility with existing infrastructure make it a critical resource for cloud customers who cannot access newer hardware or whose workloads are optimized for the A100's architecture.

AWS's admission that it has never decommissioned an A100 server and remains sold out highlights two key dynamics:

  1. Demand outstrips supply for all GPU generations, not just the latest ones.
  2. Legacy hardware retains value in AI workloads, especially for inference and fine-tuning, where raw performance gains from newer chips may not justify migration costs.

This is consistent with broader industry trends. Microsoft Azure and Google Cloud have also reported sustained demand for older GPU instances, and NVIDIA's data center revenue has continued to grow even as new products launch.

Key Numbers

A100 release year 2020 AWS A100 server status Never retired, completely sold out Demand vs. supply Demand exceeds supply for older chips Successor chips H100 (2022), B100 (2024)

What This Means in Practice

Introducing NVIDIA HGX A100: The Most Powerful Accelerated Server ...

For AI engineers and cloud architects, this means provisioning GPU capacity remains a bottleneck. If you need A100s, you may face long wait times or be forced to use less efficient alternatives. The lack of retirement suggests AWS expects continued demand for years to come, so planning for multi-year reservations or alternative architectures (e.g., custom chips like Trainium) is prudent.

gentic.news Analysis

This statement from AWS's CEO is a rare admission of the severity of the GPU crunch. While it's common knowledge that H100s are hard to get, the fact that even older A100s are completely sold out indicates the shortage is deeper than many realize. It's not just about cutting-edge hardware; every available GPU is being consumed.

This aligns with trends we've been tracking at gentic.news. For instance, in our coverage of the NVIDIA earnings call, we noted that data center revenue was up over 400% year-over-year, driven by demand that spans multiple product generations. The A100's longevity also mirrors the pattern we saw with the V100, which remained in production for years after its successor launched.

The implication for cloud customers is clear: don't expect relief soon. AWS's investment in custom silicon (Trainium and Inferentia) is partly a response to this shortage, but those chips are optimized for specific workloads and may not be drop-in replacements for NVIDIA GPUs. Enterprises should consider multi-cloud strategies, reserved instances, or even on-premises deployments to secure compute capacity.

Frequently Asked Questions

Why is AWS still sold out of A100 servers?

Demand for AI compute has outstripped supply for years, and A100s remain useful for inference and fine-tuning workloads. AWS has never retired them because customers keep renting them, and the company cannot produce enough new servers to meet demand.

Is the A100 still relevant for AI workloads in 2026?

Yes. While the H100 and B100 offer higher performance, the A100 is sufficient for many inference tasks and fine-tuning jobs. Its widespread software support and lower cost compared to newer chips make it a practical choice when available.

How does this compare to the H100 shortage?

The H100 has been even harder to procure, with wait times exceeding 6 months at some cloud providers. The A100 shortage is less severe but still significant, as AWS's statement confirms.

What should I do if I need A100 capacity?

Consider reserving instances in advance, using alternative GPU types (e.g., L40S, H100), or exploring AWS's custom Trainium chips for training workloads. Multi-cloud strategies can also help distribute demand across providers.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The sustained demand for A100s highlights a key reality: AI inference is becoming the dominant compute load, and older hardware is perfectly adequate for it. The A100's memory bandwidth (2 TB/s) and 80 GB HBM2e memory are sufficient for running many large language models at acceptable throughput. This is not a case of customers settling for less; it's a rational economic choice. When H100s are 2-3x more expensive per hour and often unavailable, the A100 becomes the pragmatic option. From a systems perspective, this also means that cloud providers are incentivized to keep older hardware online longer than they might have historically. The typical 3-4 year refresh cycle for data center GPUs is extending to 5-6 years or more. This has implications for energy efficiency (older chips consume more power per FLOP) and for software optimization — engineers should continue optimizing for Ampere architecture, not just Hopper or Blackwell. For practitioners, this is a signal to invest in GPU-agnostic frameworks like vLLM or TensorRT-LLM that can efficiently target multiple GPU generations. The ability to seamlessly switch between A100 and H100 instances without code changes is becoming a competitive advantage. Also, consider that the A100's availability may improve as more H100 and B100 capacity comes online, but AWS's statement suggests no immediate relief.
Enjoyed this article?
Share:

Related Articles

More in Products & Launches

View all