![The Ultimate Guide to Deep Seek R1: Create Lightning-Fast Web Agents ...](https://the-decoder.com/wp-content/uploads/2024/12/deepseek_whale_logo.png)

![README.md · deepseek-ai/DeepSeek-R1-Distill-Llama-8B at main](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/resolve/main/figures/benchmark.jpg)

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

DeepSeek v4 pricing graph shows $0.43 per million tokens, with a bar chart comparing cost savings against other AI…

Products & LaunchesScore: 100

DeepSeek v4 Pricing Cuts 75%: $0.43/M Tokens In

DeepSeek v4 API pricing permanently cut 75% to $0.43/M input, $0.87/M output, enabled by 27% compute and 10% cache vs v3.2.

AAAla SMITH & AI Research Desk·May 22, 2026·3 min read··288 views·AI-Generated·Report error

Source: x.comvia @kimmonismusWidely Reported

What is the new pricing for DeepSeek v4 API?

DeepSeek v4 API pricing dropped 75% to $0.43 per million input tokens and $0.87 per million output tokens, driven by efficiency gains: 27% compute and 10% cache vs v3.2, per @kimmonismus.

TL;DR

DeepSeek v4 prices cut 75% permanently. · Input: $0.43/M tokens, output: $0.87/M. · Model uses 27% compute vs v3.2.

DeepSeek v4 API pricing dropped 75% to $0.43/M input tokens. The permanent cut, announced via @kimmonismus, undercuts most frontier models by a factor of 5-10x.

Key facts

75% permanent price cut on DeepSeek v4 API.
Input: $0.43/M tokens; Output: $0.87/M tokens.
Uses 27% compute and 10% cache vs v3.2.
Undercuts GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro.
SemiAnalysis published analysis of efficiency gains.

DeepSeek v4 API pricing has been permanently reduced by 75%, according to a post by @kimmonismus on X. Input tokens now cost $0.43 per million tokens, and output tokens are priced at $0.87 per million tokens. This aggressive move positions DeepSeek v4 as the cheapest frontier-class model on the market, undercutting GPT-4o ($2.50/$10 per million), Claude 3.5 Sonnet ($3/$15), and Gemini 1.5 Pro ($1.25/$5) by a wide margin.

The price cut is not a temporary promotion but a structural shift enabled by the model's architecture. According to the DeepSeek v4 technical paper and analysis by SemiAnalysis, the model uses only 27% of the compute and 10% of the cache compared to DeepSeek v3.2. This massive efficiency gain allows DeepSeek to pass savings to customers while maintaining margins.

Why This Matters

The Ultimate Guide to Deep Seek R1: Create Lightning-Fast Web Agents ...

The 75% cut is not merely a pricing war move — it reflects a fundamental difference in DeepSeek's approach to inference optimization. While US labs focus on scaling model size and context windows, DeepSeek has prioritized token-per-dollar efficiency. The result: a model that matches or exceeds v3.2's quality at a fraction of the cost, making it viable for cost-sensitive applications like batch processing, long-form generation, and high-volume API calls.

This also pressures competitors. OpenAI and Anthropic have been raising prices or introducing tiered plans. DeepSeek's permanent low pricing raises the question: can US labs match this cost structure without sacrificing margin?

What's Not Clear

README.md · deepseek-ai/DeepSeek-R1-Distill-Llama-8B at main

DeepSeek has not disclosed whether the pricing applies globally or is region-restricted. The company also hasn't specified if the $0.43/$0.87 rates apply to all context lengths or only up to a certain token limit. Given the 10% cache usage, it's likely that long-context queries may incur higher effective costs due to cache misses, though this has not been confirmed.

What to watch

Watch for DeepSeek's next earnings or blog post detailing cache hit rates and latency under real-world workloads. Also track whether OpenAI or Anthropic respond with price cuts within 90 days — a non-response would signal they cannot match the cost structure.

Source: gentic.news · May 22, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

DeepSeek's 75% price cut is a structural signal, not a tactical discount. The underlying efficiency — 27% compute and 10% cache vs v3.2 — suggests DeepSeek has cracked a level of inference optimization that US labs have not matched. While OpenAI and Anthropic have focused on model quality and safety features, DeepSeek has optimized for cost per token, a metric that matters more for production workloads than benchmark scores. This creates a dilemma for competitors. Matching DeepSeek's pricing would require either absorbing margin loss or achieving similar architectural efficiencies — neither of which is trivial. The likely response is not across-the-board price cuts but targeted discounts for high-volume customers or new tiers that bundle features. The open question is whether DeepSeek's efficiency gains generalize to all workloads. If cache hit rates degrade at longer contexts or under high concurrency, the effective cost may be higher than advertised. DeepSeek has not released detailed performance data under those conditions, leaving room for ambiguity.

#inference optimization #deepseek #ai pricing

Compare side-by-side

DeepSeek V4 vs GPT-4o

→

Mentioned in this article

DeepSeek V4 DeepSeek SemiAnalysis GPT-4o Claude 3.5 Sonnet Gemini 3 Pro

Enjoyed this article?