LLM API Pricing 2026
Every major large-language-model API, compared per million tokens. Flagship reasoning runs from $5/M input (GPT-5.5, Claude Opus 4.8) down to $0.14/M for the cheapest capable workhorses (DeepSeek V4 Flash). Output tokens cost 3–6× input almost everywhere, and batch + caching are the two levers that cut real-world spend the most.
OpenAI
Official pricing →| Model | Input /1M | Output /1M | Cached in | Context | Best for |
|---|---|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | $1.25 | 1.05M | Flagship reasoning & agents |
| GPT-5.5 Pro | $30.00 | $180.00 | — | 1.1M | Hardest reasoning, max accuracy |
| GPT-5.4 | $2.50 | $15.00 | — | 1M | Prior-gen default, cheaper |
| GPT-5.2-Codex | $1.75 | $14.00 | — | 400K | Code-specialised workloads |
Cost levers: Batch & Flex run at 50% of standard. Priority is ~2.5× standard. Cached input is ~90% cheaper. Prompts over 272K tokens are billed at 2× input / 1.5× output for the whole session.
Anthropic (Claude)
Official pricing →| Model | Input /1M | Output /1M | Cached in | Context | Best for |
|---|---|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $25.00 | $0.50 | 1M | Flagship: complex agentic coding |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 1M | Recommended production default |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | 200K | High-volume routing, extraction |
Cost levers: Batch is 50% off. Prompt caching cuts cached input ~90%. The 1M-token context is flat-rate (no long-context surcharge). inference_geo:"us" adds a 1.1× multiplier vs the global default.
Google (Gemini)
Official pricing →| Model | Input /1M | Output /1M | Cached in | Context | Best for |
|---|---|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $12.00 | — | 200K+ | Flagship multimodal reasoning |
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 | 1M | Fast coding & agentic, mid-price |
| Gemini 3 Flash | $0.50 | $3.00 | — | 1M | Cheapest high-volume multimodal |
Cost levers: Batch is 50% off. Cache reads cost ~10% of base input. Inputs over 200K tokens trigger a long-context surcharge on all tokens. Flash tiers retain a free quota; Pro left the free tier in April 2026.
DeepSeek
Official pricing →| Model | Input /1M | Output /1M | Cached in | Context | Best for |
|---|---|---|---|---|---|
| DeepSeek V4 Pro | $1.74 | $3.48 | $0.015 | 1M | Near-flagship reasoning, low cost |
| DeepSeek V4 Flash | $0.14 | $0.28 | $0.0028 | 1M | Cheapest workhorse, huge volume |
Cost levers: Input is split into cache-hit and cache-miss buckets — cache hits can be ~50–98% cheaper. 1M context and 384K max output are included. New accounts get 5M free tokens for 30 days. Promo rates have been well below list.
xAI (Grok)
Official pricing →| Model | Input /1M | Output /1M | Cached in | Context | Best for |
|---|---|---|---|---|---|
| Grok 4.3 | $1.25 | $2.50 | — | 256K | Current flagship, value-priced |
| Grok 4.20 | $2.00 | $6.00 | — | 2M | Long-context option |
| Grok 4.1 Fast | $0.20 | $0.50 | $0.05 | 2M | Cheap workhorse, 2M context |
Cost levers: Batch is 50% off all models. Server-side tools (web/X search, code exec) bill separately per 1,000 calls. Cached input is heavily discounted. New users get $25 in credits (plus data-sharing credits).
Mistral
Official pricing →| Model | Input /1M | Output /1M | Cached in | Context | Best for |
|---|---|---|---|---|---|
| Mistral Large 3 (2512) | $0.50 | $1.50 | — | 262K | Open-weight flagship, low cost |
Cost levers: Batch is 50% off. Open-weight models can also be self-hosted, removing per-token API cost entirely.
How to actually cut your LLM bill
- Cache stable prompts. Repeated system prompts and long context get 90–98% off via prompt caching on OpenAI, Anthropic, Google and DeepSeek — often the single biggest saving for agents.
- Batch the non-urgent. Asynchronous Batch endpoints are a flat 50% off at every major provider, with <24h turnaround.
- Right-size the model. Sonnet 4.6, Gemini 3.5 Flash and Grok 4.1 Fast deliver near-flagship quality at a fraction of flagship price — reserve Opus/GPT-5.5 Pro for the hardest 10%.
- Watch output length. Output is 3–6× input; terse-output prompting compounds across millions of calls.
- Mind the asterisks. Long-context surcharges (OpenAI >272K, Google >200K) and US-residency multipliers (Anthropic 1.1×) can quietly raise effective cost.
LLM API pricing — FAQ
How much does the GPT-5.5 API cost in 2026?
OpenAI's GPT-5.5 costs $5.00 per million input tokens and $30.00 per million output tokens at standard rates — a 2× increase over GPT-5.4 ($2.50/$15.00) introduced with the April 23, 2026 release. Cached input is $1.25/M, and Batch or Flex processing runs at 50% of standard ($2.50/$15.00).
What is the cheapest LLM API in 2026?
Among frontier-adjacent models, DeepSeek V4 Flash ($0.14 input / $0.28 output per million tokens) and xAI's Grok 4.1 Fast ($0.20 / $0.50, with a 2M-token context) are the cheapest, followed by Google's Gemini 3 Flash ($0.50 / $3.00). With prompt caching, DeepSeek cache hits fall below $0.01 per million input tokens.
How much does the Claude API cost?
Anthropic's flagship Claude Opus 4.8 is $5.00 input / $25.00 output per million tokens; Sonnet 4.6 (the recommended production default) is $3.00 / $15.00; and Haiku 4.5 is $1.00 / $5.00. Prompt caching cuts cached input ~90%, and Batch is 50% off across all three.
Why are output tokens more expensive than input tokens?
Across nearly every provider, output (generated) tokens cost roughly 3–6× input tokens — for example, Claude models charge 5× output-over-input, and GPT-5.5 charges 6×. Generation is sequential and compute-heavy, so providers price it higher. This means concise outputs and prompt caching (which discounts repeated input) are the two biggest cost levers.
Do batch and caching discounts really cut LLM costs?
Yes, substantially. Batch/asynchronous processing is 50% off at OpenAI, Anthropic, Google, xAI and Mistral. Prompt caching discounts repeated input by ~90% (OpenAI, Anthropic), ~90% via 10% cache reads (Google), or up to ~98% (DeepSeek cache hits). For agents with large, stable system prompts, caching alone can cut total spend by more than half.