Skip to content
gentic.news — AI News Intelligence Platform

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x
Big TechScore: 84

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x

DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

Share:
Source: the-decoder.comvia the_decoder, scmp_tech, bloomberg_techCorroborated

Key Takeaways

  • DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window.
  • The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

DeepSeek V4-Pro and V4-Flash: Open-Weight Giants That Undercut Rivals 10x

Chinese AI lab DeepSeek has released two new open-weight models — V4-Pro and V4-Flash — that combine massive scale with dramatically lower pricing. V4-Pro, with 1.6 trillion total parameters (49 billion active), is now the largest open-weights model available, surpassing Kimi K2.6 (1.1 trillion) and GLM-5.1 (754 billion). V4-Flash comes in at 284 billion total parameters (13 billion active). Both are mixture-of-experts (MoE) models with a one-million-token context window, released under the MIT license on Hugging Face.

The models represent DeepSeek's first new architecture since V3. Every model released in between — V3.1, V3.2, R1, and R1 0528 — was still built on the original V3 design with 685 billion parameters. This is a genuine generational leap.

Key Numbers

Total Parameters 1.6 trillion 284 billion Active Parameters 49 billion 13 billion Context Window 1M tokens 1M tokens Training Tokens Up to 33 trillion Up to 33 trillion Input Price (per 1M tokens) $1.74 $0.14 Output Price (per 1M tokens) $3.48 $0.28 GDPval-AA Elo 1,554 Not yet reported

How the New Architecture Cuts Compute by 73–90%

The key innovation is a hybrid attention architecture that combines token compression with DeepSeek's sparse attention. According to the technical report:

Image description

  • V4-Pro needs just 27% of the FLOPs and 10% of the KV cache compared to V3.2 when processing a one-million-token context.
  • V4-Flash pushes those numbers even lower — down to 10% of the FLOPs and 7% of the KV cache.

This is a direct response to the rising cost of long-context inference, which has become a pain point for agentic AI workloads. Competitors like OpenAI, Google, and Anthropic have recently raised prices and capped usage on their long-context models. DeepSeek's architecture makes long contexts cheap to serve.

The models were trained on up to 33 trillion tokens and refined through distillation from in-house specialist models. They run on both Nvidia GPUs and Huawei's Ascend chips, a detail that signals DeepSeek's continued investment in hardware diversity.

Pricing: 10x Cheaper Than GPT-5.5

V4-Flash costs just $0.14 per million input tokens and $0.28 per million output tokens, making it cheaper than OpenAI's GPT-5.4 Nano. V4-Pro comes in at $1.74 and $3.48, significantly undercutting Gemini 3.1 Pro, GPT-5.5, and Claude Sonnet 4.6. For context, GPT-5.5 is priced at roughly $15 per million input tokens — over 10x more than V4-Pro.

Benchmark Performance: Close but Not Leading

On Artificial Analysis's GDPval-AA benchmark, V4-Pro leads all open-weights models with 1,554 Elo points, ahead of GLM-5.1 (1,535) and Kimi K2.6 (1,484). That's a jump of roughly 355 Elo points over V3.2.

However, DeepSeek acknowledges in the paper that V4-Pro "falls slightly behind GPT-5.4 and Gemini-3.1-Pro" and trails frontier models by about three to six months. Full testing by Artificial Analysis is still underway. OpenAI and Anthropic have since released new models with GPT-5.5 and Opus 4.7.

What This Means in Practice

For developers building agentic AI applications — where long context windows and frequent API calls drive up costs — DeepSeek V4-Flash at $0.14/M input tokens is a game-changer for price-sensitive workloads. The 1M-token context window, combined with the compute-efficient architecture, makes it viable for tasks like processing entire codebases or long document analysis that would be prohibitively expensive on GPT-5.5.

However, for tasks requiring frontier-level reasoning or creative generation, the gap to GPT-5.5 and Opus 4.7 remains. Teams should benchmark on their specific use cases before committing.

Technical Details

Both models are released as open weights under the MIT license, available on Hugging Face. The technical paper details training on up to 33 trillion tokens, distillation from in-house specialist models, and hardware support for both Nvidia GPUs and Huawei's Ascend chips. The hybrid attention architecture is the core innovation, combining token compression with sparse attention to dramatically reduce compute for long contexts.

gentic.news Analysis

DeepSeek's V4 launch is the latest chapter in a pricing war that has defined the 2025–2026 AI landscape. A year ago, DeepSeek rattled Silicon Valley with V3 and R1, forcing OpenAI and Google to cut prices. Now, with V4, DeepSeek is doubling down on the same strategy: open-weight models that are "good enough" for most tasks at a fraction of the cost.

What's notable is the timing. Competitors have been raising prices and capping usage on their long-context models, citing the high cost of agentic AI workloads. DeepSeek's hybrid attention architecture directly addresses this pain point, making long contexts cheap to serve. This is a classic market disruption play: while incumbents optimize for margin on premium products, DeepSeek optimizes for cost at scale.

The hardware diversity angle is also worth watching. DeepSeek's support for Huawei's Ascend chips alongside Nvidia GPUs gives it supply chain resilience that Western labs lack. If US export controls tighten further, DeepSeek could maintain production while competitors scramble for Nvidia allocation.

However, the benchmark gap is real. V4-Pro trails GPT-5.5 and Opus 4.7 by three to six months. For applications where model quality directly impacts revenue — like code generation for enterprise deployment or high-stakes medical diagnosis — the extra cost of frontier models may still be justified. DeepSeek's play is for the long tail of use cases where "good enough" at 10x lower cost wins.

This aligns with a broader trend we've covered: the commoditization of foundation models. As open-weight models close the gap to frontier labs, the competitive moat shifts from model quality to distribution, infrastructure, and ecosystem. DeepSeek's MIT license and aggressive pricing are designed to capture developer mindshare, similar to how Meta's Llama strategy played out in 2023–2024.

Frequently Asked Questions

How does DeepSeek V4-Pro compare to GPT-5.5?

DeepSeek V4-Pro trails GPT-5.5 by about three to six months on benchmark performance, according to the company's own technical report. On GDPval-AA, V4-Pro scores 1,554 Elo points, leading all open-weight models but falling slightly behind GPT-5.4 and Gemini-3.1-Pro. OpenAI has since released GPT-5.5, widening the gap.

What makes DeepSeek V4's architecture different from V3?

V4 introduces a hybrid attention architecture that combines token compression with sparse attention. This reduces FLOPs by 73% for V4-Pro and 90% for V4-Flash when processing a one-million-token context, compared to V3.2. The KV cache requirements also drop by 90% and 93% respectively. This is the first new architecture from DeepSeek since V3.

How much does DeepSeek V4 cost compared to competitors?

V4-Flash costs $0.14 per million input tokens and $0.28 per million output tokens — cheaper than OpenAI's GPT-5.4 Nano. V4-Pro costs $1.74 and $3.48, roughly 10x cheaper than GPT-5.5 ($15/M input tokens) and significantly undercutting Gemini 3.1 Pro and Claude Sonnet 4.6.

Is DeepSeek V4 open-source?

Yes, both V4-Pro and V4-Flash are released as open weights under the MIT license. They are available on Hugging Face. The technical paper detailing architecture, training data, and hardware is also publicly available.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

DeepSeek's V4 launch is the latest chapter in a pricing war that has defined the 2025–2026 AI landscape. A year ago, DeepSeek rattled Silicon Valley with V3 and R1, forcing OpenAI and Google to cut prices. Now, with V4, DeepSeek is doubling down on the same strategy: open-weight models that are 'good enough' for most tasks at a fraction of the cost. What's notable is the timing. Competitors have been raising prices and capping usage on their long-context models, citing the high cost of agentic AI workloads. DeepSeek's hybrid attention architecture directly addresses this pain point, making long contexts cheap to serve. This is a classic market disruption play: while incumbents optimize for margin on premium products, DeepSeek optimizes for cost at scale. The hardware diversity angle is also worth watching. DeepSeek's support for Huawei's Ascend chips alongside Nvidia GPUs gives it supply chain resilience that Western labs lack. If US export controls tighten further, DeepSeek could maintain production while competitors scramble for Nvidia allocation. However, the benchmark gap is real. V4-Pro trails GPT-5.5 and Opus 4.7 by three to six months. For applications where model quality directly impacts revenue — like code generation for enterprise deployment or high-stakes medical diagnosis — the extra cost of frontier models may still be justified. DeepSeek's play is for the long tail of use cases where 'good enough' at 10x lower cost wins. This aligns with a broader trend we've covered: the commoditization of foundation models. As open-weight models close the gap to frontier labs, the competitive moat shifts from model quality to distribution, infrastructure, and ecosystem. DeepSeek's MIT license and aggressive pricing are designed to capture developer mindshare, similar to how Meta's Llama strategy played out in 2023–2024.
Enjoyed this article?
Share:

Related Articles

More in Big Tech

View all