DeepSeek v4 API pricing dropped 75% to $0.43/M input tokens. The permanent cut, announced via @kimmonismus, undercuts most frontier models by a factor of 5-10x.
Key facts
- 75% permanent price cut on DeepSeek v4 API.
- Input: $0.43/M tokens; Output: $0.87/M tokens.
- Uses 27% compute and 10% cache vs v3.2.
- Undercuts GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro.
- SemiAnalysis published analysis of efficiency gains.
DeepSeek v4 API pricing has been permanently reduced by 75%, according to a post by @kimmonismus on X. Input tokens now cost $0.43 per million tokens, and output tokens are priced at $0.87 per million tokens. This aggressive move positions DeepSeek v4 as the cheapest frontier-class model on the market, undercutting GPT-4o ($2.50/$10 per million), Claude 3.5 Sonnet ($3/$15), and Gemini 1.5 Pro ($1.25/$5) by a wide margin.
The price cut is not a temporary promotion but a structural shift enabled by the model's architecture. According to the DeepSeek v4 technical paper and analysis by SemiAnalysis, the model uses only 27% of the compute and 10% of the cache compared to DeepSeek v3.2. This massive efficiency gain allows DeepSeek to pass savings to customers while maintaining margins.
Why This Matters
The 75% cut is not merely a pricing war move — it reflects a fundamental difference in DeepSeek's approach to inference optimization. While US labs focus on scaling model size and context windows, DeepSeek has prioritized token-per-dollar efficiency. The result: a model that matches or exceeds v3.2's quality at a fraction of the cost, making it viable for cost-sensitive applications like batch processing, long-form generation, and high-volume API calls.
This also pressures competitors. OpenAI and Anthropic have been raising prices or introducing tiered plans. DeepSeek's permanent low pricing raises the question: can US labs match this cost structure without sacrificing margin?
What's Not Clear
DeepSeek has not disclosed whether the pricing applies globally or is region-restricted. The company also hasn't specified if the $0.43/$0.87 rates apply to all context lengths or only up to a certain token limit. Given the 10% cache usage, it's likely that long-context queries may incur higher effective costs due to cache misses, though this has not been confirmed.
What to watch
Watch for DeepSeek's next earnings or blog post detailing cache hit rates and latency under real-world workloads. Also track whether OpenAI or Anthropic respond with price cuts within 90 days — a non-response would signal they cannot match the cost structure.









