Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

AI Cost Calculator 2026

Enter your usage and instantly see what every major LLM API will cost you per month — GPT-5.5, Claude Opus 4.8, Gemini 3.1, DeepSeek V4, Grok, Llama 4 and more, ranked cheapest first. Includes prompt-caching and batch discounts. Free, no signup, results are shareable.

Start from a use case
Prompt + retrieved context
Generated answer length
Total API calls per month
60%
Prompt caching discounts repeated input ~90% (≈98% on DeepSeek). Big system prompts → cache heavily.
Non-real-time jobs run at half price.
Cheapest for this workload
DeepSeek V4 Flash · DeepSeek
$25.70/mo
vs $14.4k/mo top-end — save up to 100%
ModelPer requestMonthly
CHEAPESTDeepSeek V4 Flashopen
DeepSeek · 1.0M ctx
<$0.01$25.70
Llama 4 Scoutopen
Meta · 10M ctx
<$0.01$33.00
Grok 4.1 Fast
xAI · 2M ctx
<$0.01$48.00
Llama 4 Maverickopen
Meta · 1M ctx
<$0.01$63.00
DeepSeek V4 Proopen
DeepSeek · 1.0M ctx
<$0.01$81.00
Gemini 3.1 Flash-Lite
Google · 1.0M ctx
<$0.01$120
Mistral Large 3 (2512)open
Mistral AI · 262K ctx
<$0.01$195
Kimi K2.6open
Moonshot AI · 262K ctx
<$0.01$214
DeepSeek-R1open
DeepSeek · 131K ctx
<$0.01$231
MiniMax M3open
MiniMax · 1M ctx
<$0.01$252
Claude Haiku 4.5
Anthropic · 200K ctx
<$0.01$288
Amazon Nova Pro
Amazon · 300K ctx
<$0.01$336
Grok 4.3
xAI · 1M ctx
<$0.01$450
Gemini 3.5 Flash
Google · 1.0M ctx
<$0.01$477
GLM-5.2open
Z.ai · 1M ctx
$0.0110$552
Qwen3.7-Max
Alibaba · 1M ctx
$0.0114$570
Gemini 3.1 Pro
Google · 2M ctx
$0.0127$636
GPT-5.4
OpenAI · 1M ctx
$0.0159$795
Claude Sonnet 4.6
Anthropic · 1M ctx
$0.0173$864
GPT-5.2-Codex
OpenAI · 400K ctx
$0.0189$945
Command Aopen
Cohere · 256K ctx
$0.0210$1050
Amazon Nova Premier
Amazon · 1M ctx
$0.0225$1125
Claude Opus 4.8
Anthropic · 1M ctx
$0.0288$1440
GPT-5.5
OpenAI · 1.1M ctx
$0.0318$1590
GPT-5.5 Pro
OpenAI · 1.1M ctx
$0.2880$14.4k
Estimate at standard USD list rates · 50,000 req/mo × (6,000 in + 600 out) tokens. Caching applies to input only; batch to both.

2026 LLM API price reference · per 1M tokens · verified 2026-06-20

ModelMakerInputOutputCached inContext
GPT-5.5OpenAI$5.00$30.00$0.501.1M
GPT-5.5 ProOpenAI$30.00$180.001.1M
GPT-5.4OpenAI$2.50$15.00$0.251M
GPT-5.2-CodexOpenAI$1.75$14.00400K
Claude Opus 4.8Anthropic$5.00$25.00$0.501M
Claude Sonnet 4.6Anthropic$3.00$15.00$0.301M
Claude Haiku 4.5Anthropic$1.00$5.00$0.10200K
Gemini 3.1 ProGoogle$2.00$12.00$0.202M
Gemini 3.5 FlashGoogle$1.50$9.00$0.151.0M
Gemini 3.1 Flash-LiteGoogle$0.25$1.501.0M
DeepSeek V4 ProDeepSeek$0.43$0.87$0.01501.0M
DeepSeek V4 FlashDeepSeek$0.14$0.28$0.00281.0M
DeepSeek-R1DeepSeek$0.55$2.19131K
Grok 4.3xAI$1.25$2.501M
Grok 4.1 FastxAI$0.20$0.50$0.05002M
Mistral Large 3 (2512)Mistral AI$0.50$1.50262K
Qwen3.7-MaxAlibaba$2.50$7.50$0.251M
Kimi K2.6Moonshot AI$0.67$3.50$0.16262K
GLM-5.2Z.ai$1.40$4.401M
Command ACohere$2.50$10.00256K
Amazon Nova PremierAmazon$2.50$12.501M
Amazon Nova ProAmazon$0.80$3.20300K
MiniMax M3MiniMax$0.60$2.401M
Llama 4 MaverickMeta$0.15$0.601M
Llama 4 ScoutMeta$0.08$0.3010M

Standard list rates. For caching/batch/long-context levers and official sources, see the full LLM API pricing guide.

AI cost calculator — FAQ

How do I estimate my LLM API cost?

Multiply your monthly requests by the input and output tokens per request, then apply each model's per-million-token rate: cost = (monthly input tokens × input price + monthly output tokens × output price) ÷ 1,000,000. This calculator does it across every major 2026 model at once and adds caching and batch discounts. Output tokens cost 3–6× input almost everywhere, so output length is usually the biggest lever.

What is the cheapest AI model API in 2026?

For capable models, DeepSeek V4 Flash and xAI Grok 4.1 Fast are the cheapest per token, then Google Gemini Flash tiers. Open-weight models can be self-hosted to remove per-token API cost entirely. With heavy prompt caching, DeepSeek cache hits fall below $0.01 per million input tokens. The cheapest model for YOUR workload depends on your input/output ratio — use the calculator above.

How much does the GPT-5.5 API cost?

GPT-5.5 is $5.00 per million input tokens and $30.00 per million output tokens at standard rates, with cached input discounted. Batch processing runs at 50% off. For a typical RAG app (6k input, 600 output, 50k requests/month) that is roughly $1,200–$1,700/month before caching.

Does prompt caching really reduce cost?

Substantially. Prompt caching discounts repeated input tokens by roughly 90% at OpenAI and Anthropic, ~90% via cache reads at Google, and up to ~98% on DeepSeek. For agents with large, stable system prompts that repeat on every call, caching alone can cut total spend by more than half — set the caching slider to match how much of your prompt repeats.

Is DeepSeek really cheaper than GPT-5.5 and Claude?

Yes, dramatically. DeepSeek V4 is many times cheaper on both input and output than GPT-5.5 and Claude Opus while staying near-flagship on reasoning. For high-volume, cost-sensitive workloads DeepSeek and Gemini Flash are usually the value leaders; for the hardest reasoning, the frontier models (GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro) still lead.

Embed this calculator — free

Add the AI Cost Calculator to your own site or blog. Copy this snippet:

<iframe src="https://gentic.news/embed/ai-cost-calculator" width="100%" height="920" style="border:1px solid #262626;border-radius:12px" title="AI Cost Calculator by gentic.news" loading="lazy"></iframe>
AI Model Comparison
Specs, benchmarks & pricing for every major model
LLM API Pricing Guide
Full rates + caching/batch levers, sourced
Best LLMs 2026
Ranked by benchmark and real-world use