Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

LLM API Pricing 2026

Every major large-language-model API, compared per million tokens. Flagship reasoning runs from $5/M input (GPT-5.5, Claude Opus 4.8) down to $0.14/M for the cheapest capable workhorses (DeepSeek V4 Flash). Output tokens cost 3–6× input almost everywhere, and batch + caching are the two levers that cut real-world spend the most.

Last verified June 8, 2026.Standard rates, USD per 1M tokens. Prices change often and promos vary — always confirm on the provider's official page (linked per section) before committing to production.
Cheapest capable
DeepSeek V4 Flash
$0.14 / $0.28 per 1M
Best value flagship
Grok 4.3
$1.25 / $2.50 per 1M
Priciest
GPT-5.5 Pro
$30 / $180 per 1M
ModelInput /1MOutput /1MCached inContextBest for
GPT-5.5$5.00$30.00$1.251.05MFlagship reasoning & agents
GPT-5.5 Pro$30.00$180.001.1MHardest reasoning, max accuracy
GPT-5.4$2.50$15.001MPrior-gen default, cheaper
GPT-5.2-Codex$1.75$14.00400KCode-specialised workloads

Cost levers: Batch & Flex run at 50% of standard. Priority is ~2.5× standard. Cached input is ~90% cheaper. Prompts over 272K tokens are billed at 2× input / 1.5× output for the whole session.

Anthropic (Claude)

Official pricing →
ModelInput /1MOutput /1MCached inContextBest for
Claude Opus 4.8$5.00$25.00$0.501MFlagship: complex agentic coding
Claude Sonnet 4.6$3.00$15.00$0.301MRecommended production default
Claude Haiku 4.5$1.00$5.00$0.10200KHigh-volume routing, extraction

Cost levers: Batch is 50% off. Prompt caching cuts cached input ~90%. The 1M-token context is flat-rate (no long-context surcharge). inference_geo:"us" adds a 1.1× multiplier vs the global default.

Google (Gemini)

Official pricing →
ModelInput /1MOutput /1MCached inContextBest for
Gemini 3.1 Pro$2.00$12.00200K+Flagship multimodal reasoning
Gemini 3.5 Flash$1.50$9.00$0.151MFast coding & agentic, mid-price
Gemini 3 Flash$0.50$3.001MCheapest high-volume multimodal

Cost levers: Batch is 50% off. Cache reads cost ~10% of base input. Inputs over 200K tokens trigger a long-context surcharge on all tokens. Flash tiers retain a free quota; Pro left the free tier in April 2026.

ModelInput /1MOutput /1MCached inContextBest for
DeepSeek V4 Pro$1.74$3.48$0.0151MNear-flagship reasoning, low cost
DeepSeek V4 Flash$0.14$0.28$0.00281MCheapest workhorse, huge volume

Cost levers: Input is split into cache-hit and cache-miss buckets — cache hits can be ~50–98% cheaper. 1M context and 384K max output are included. New accounts get 5M free tokens for 30 days. Promo rates have been well below list.

ModelInput /1MOutput /1MCached inContextBest for
Grok 4.3$1.25$2.50256KCurrent flagship, value-priced
Grok 4.20$2.00$6.002MLong-context option
Grok 4.1 Fast$0.20$0.50$0.052MCheap workhorse, 2M context

Cost levers: Batch is 50% off all models. Server-side tools (web/X search, code exec) bill separately per 1,000 calls. Cached input is heavily discounted. New users get $25 in credits (plus data-sharing credits).

ModelInput /1MOutput /1MCached inContextBest for
Mistral Large 3 (2512)$0.50$1.50262KOpen-weight flagship, low cost

Cost levers: Batch is 50% off. Open-weight models can also be self-hosted, removing per-token API cost entirely.

How to actually cut your LLM bill

  • Cache stable prompts. Repeated system prompts and long context get 90–98% off via prompt caching on OpenAI, Anthropic, Google and DeepSeek — often the single biggest saving for agents.
  • Batch the non-urgent. Asynchronous Batch endpoints are a flat 50% off at every major provider, with <24h turnaround.
  • Right-size the model. Sonnet 4.6, Gemini 3.5 Flash and Grok 4.1 Fast deliver near-flagship quality at a fraction of flagship price — reserve Opus/GPT-5.5 Pro for the hardest 10%.
  • Watch output length. Output is 3–6× input; terse-output prompting compounds across millions of calls.
  • Mind the asterisks. Long-context surcharges (OpenAI >272K, Google >200K) and US-residency multipliers (Anthropic 1.1×) can quietly raise effective cost.

LLM API pricing — FAQ

How much does the GPT-5.5 API cost in 2026?

OpenAI's GPT-5.5 costs $5.00 per million input tokens and $30.00 per million output tokens at standard rates — a 2× increase over GPT-5.4 ($2.50/$15.00) introduced with the April 23, 2026 release. Cached input is $1.25/M, and Batch or Flex processing runs at 50% of standard ($2.50/$15.00).

What is the cheapest LLM API in 2026?

Among frontier-adjacent models, DeepSeek V4 Flash ($0.14 input / $0.28 output per million tokens) and xAI's Grok 4.1 Fast ($0.20 / $0.50, with a 2M-token context) are the cheapest, followed by Google's Gemini 3 Flash ($0.50 / $3.00). With prompt caching, DeepSeek cache hits fall below $0.01 per million input tokens.

How much does the Claude API cost?

Anthropic's flagship Claude Opus 4.8 is $5.00 input / $25.00 output per million tokens; Sonnet 4.6 (the recommended production default) is $3.00 / $15.00; and Haiku 4.5 is $1.00 / $5.00. Prompt caching cuts cached input ~90%, and Batch is 50% off across all three.

Why are output tokens more expensive than input tokens?

Across nearly every provider, output (generated) tokens cost roughly 3–6× input tokens — for example, Claude models charge 5× output-over-input, and GPT-5.5 charges 6×. Generation is sequential and compute-heavy, so providers price it higher. This means concise outputs and prompt caching (which discounts repeated input) are the two biggest cost levers.

Do batch and caching discounts really cut LLM costs?

Yes, substantially. Batch/asynchronous processing is 50% off at OpenAI, Anthropic, Google, xAI and Mistral. Prompt caching discounts repeated input by ~90% (OpenAI, Anthropic), ~90% via 10% cache reads (Google), or up to ~98% (DeepSeek cache hits). For agents with large, stable system prompts, caching alone can cut total spend by more than half.

Related: Best LLMs 2026 · Best AI Tools 2026 · Anthropic vs OpenAI · AI Leaderboard · AI Answers