LLM API Pricing 2026

Every major large-language-model API, compared per million tokens. Flagship reasoning runs from $5/M input (GPT-5.5, Claude Opus 4.8) down to $0.14/M for the cheapest capable workhorses (DeepSeek V4 Flash). Output tokens cost 3–6× input almost everywhere, and batch + caching are the two levers that cut real-world spend the most.

Last verified June 8, 2026.Standard rates, USD per 1M tokens. Prices change often and promos vary — always confirm on the provider's official page (linked per section) before committing to production.

Cheapest capable

DeepSeek V4 Flash

$0.14 / $0.28 per 1M

Best value flagship

Grok 4.3

$1.25 / $2.50 per 1M

Priciest

GPT-5.5 Pro

$30 / $180 per 1M

OpenAI

Official pricing →

Model	Input /1M	Output /1M	Cached in	Context	Best for
GPT-5.5	$5.00	$30.00	$1.25	1.05M	Flagship reasoning & agents
GPT-5.5 Pro	$30.00	$180.00	—	1.1M	Hardest reasoning, max accuracy
GPT-5.4	$2.50	$15.00	—	1M	Prior-gen default, cheaper
GPT-5.2-Codex	$1.75	$14.00	—	400K	Code-specialised workloads

Cost levers: Batch & Flex run at 50% of standard. Priority is ~2.5× standard. Cached input is ~90% cheaper. Prompts over 272K tokens are billed at 2× input / 1.5× output for the whole session.

Anthropic (Claude)

Official pricing →

Model	Input /1M	Output /1M	Cached in	Context	Best for
Claude Opus 4.8	$5.00	$25.00	$0.50	1M	Flagship: complex agentic coding
Claude Sonnet 4.6	$3.00	$15.00	$0.30	1M	Recommended production default
Claude Haiku 4.5	$1.00	$5.00	$0.10	200K	High-volume routing, extraction

Cost levers: Batch is 50% off. Prompt caching cuts cached input ~90%. The 1M-token context is flat-rate (no long-context surcharge). inference_geo:"us" adds a 1.1× multiplier vs the global default.

Google (Gemini)

Official pricing →

Model	Input /1M	Output /1M	Cached in	Context	Best for
Gemini 3.1 Pro	$2.00	$12.00	—	200K+	Flagship multimodal reasoning
Gemini 3.5 Flash	$1.50	$9.00	$0.15	1M	Fast coding & agentic, mid-price
Gemini 3 Flash	$0.50	$3.00	—	1M	Cheapest high-volume multimodal

Cost levers: Batch is 50% off. Cache reads cost ~10% of base input. Inputs over 200K tokens trigger a long-context surcharge on all tokens. Flash tiers retain a free quota; Pro left the free tier in April 2026.

DeepSeek

Official pricing →

Model	Input /1M	Output /1M	Cached in	Context	Best for
DeepSeek V4 Pro	$1.74	$3.48	$0.015	1M	Near-flagship reasoning, low cost
DeepSeek V4 Flash	$0.14	$0.28	$0.0028	1M	Cheapest workhorse, huge volume

Cost levers: Input is split into cache-hit and cache-miss buckets — cache hits can be ~50–98% cheaper. 1M context and 384K max output are included. New accounts get 5M free tokens for 30 days. Promo rates have been well below list.

xAI (Grok)

Official pricing →

Model	Input /1M	Output /1M	Cached in	Context	Best for
Grok 4.3	$1.25	$2.50	—	256K	Current flagship, value-priced
Grok 4.20	$2.00	$6.00	—	2M	Long-context option
Grok 4.1 Fast	$0.20	$0.50	$0.05	2M	Cheap workhorse, 2M context

Cost levers: Batch is 50% off all models. Server-side tools (web/X search, code exec) bill separately per 1,000 calls. Cached input is heavily discounted. New users get $25 in credits (plus data-sharing credits).

Mistral

Official pricing →

Model	Input /1M	Output /1M	Cached in	Context	Best for
Mistral Large 3 (2512)	$0.50	$1.50	—	262K	Open-weight flagship, low cost

Cost levers: Batch is 50% off. Open-weight models can also be self-hosted, removing per-token API cost entirely.

How to actually cut your LLM bill

Cache stable prompts. Repeated system prompts and long context get 90–98% off via prompt caching on OpenAI, Anthropic, Google and DeepSeek — often the single biggest saving for agents.
Batch the non-urgent. Asynchronous Batch endpoints are a flat 50% off at every major provider, with <24h turnaround.
Right-size the model. Sonnet 4.6, Gemini 3.5 Flash and Grok 4.1 Fast deliver near-flagship quality at a fraction of flagship price — reserve Opus/GPT-5.5 Pro for the hardest 10%.
Watch output length. Output is 3–6× input; terse-output prompting compounds across millions of calls.
Mind the asterisks. Long-context surcharges (OpenAI >272K, Google >200K) and US-residency multipliers (Anthropic 1.1×) can quietly raise effective cost.

LLM API pricing — FAQ

How much does the GPT-5.5 API cost in 2026?

OpenAI's GPT-5.5 costs $5.00 per million input tokens and $30.00 per million output tokens at standard rates — a 2× increase over GPT-5.4 ($2.50/$15.00) introduced with the April 23, 2026 release. Cached input is $1.25/M, and Batch or Flex processing runs at 50% of standard ($2.50/$15.00).

What is the cheapest LLM API in 2026?

Among frontier-adjacent models, DeepSeek V4 Flash ($0.14 input / $0.28 output per million tokens) and xAI's Grok 4.1 Fast ($0.20 / $0.50, with a 2M-token context) are the cheapest, followed by Google's Gemini 3 Flash ($0.50 / $3.00). With prompt caching, DeepSeek cache hits fall below $0.01 per million input tokens.

How much does the Claude API cost?

Anthropic's flagship Claude Opus 4.8 is $5.00 input / $25.00 output per million tokens; Sonnet 4.6 (the recommended production default) is $3.00 / $15.00; and Haiku 4.5 is $1.00 / $5.00. Prompt caching cuts cached input ~90%, and Batch is 50% off across all three.

Why are output tokens more expensive than input tokens?

Across nearly every provider, output (generated) tokens cost roughly 3–6× input tokens — for example, Claude models charge 5× output-over-input, and GPT-5.5 charges 6×. Generation is sequential and compute-heavy, so providers price it higher. This means concise outputs and prompt caching (which discounts repeated input) are the two biggest cost levers.

Do batch and caching discounts really cut LLM costs?

Yes, substantially. Batch/asynchronous processing is 50% off at OpenAI, Anthropic, Google, xAI and Mistral. Prompt caching discounts repeated input by ~90% (OpenAI, Anthropic), ~90% via 10% cache reads (Google), or up to ~98% (DeepSeek cache hits). For agents with large, stable system prompts, caching alone can cut total spend by more than half.