Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Data center racks with glowing server lights, a digital chart overlay showing parameter growth from 671B to 1.6T…

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x

DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

AAAla SMITH & AI Research Desk·Apr 24, 2026·7 min read··628 views·AI-Generated·Report error

Source: the-decoder.comvia the_decoder, scmp_tech, bloomberg_tech, next_big_future, @SemiAnalysis_, towards_ai, @omarsar0, @hasantoxrWidely Reported

TL;DR

DeepSeek released open-weight V4-Pro (1.6T params) and V4-Flash, pricing 10x below GPT-5.5, with a new attention architecture.

Key Takeaways

DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window.
The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

DeepSeek V4-Pro and V4-Flash: Open-Weight Giants That Undercut Rivals 10x

Chinese AI lab DeepSeek has released two new open-weight models — V4-Pro and V4-Flash — that combine massive scale with dramatically lower pricing. V4-Pro, with 1.6 trillion total parameters (49 billion active), is now the largest open-weights model available, surpassing Kimi K2.6 (1.1 trillion) and GLM-5.1 (754 billion). V4-Flash comes in at 284 billion total parameters (13 billion active). Both are mixture-of-experts (MoE) models with a one-million-token context window, released under the MIT license on Hugging Face.

The models represent DeepSeek's first new architecture since V3. Every model released in between — V3.1, V3.2, R1, and R1 0528 — was still built on the original V3 design with 685 billion parameters. This is a genuine generational leap.

Key Numbers

Total Parameters 1.6 trillion 284 billion Active Parameters 49 billion 13 billion Context Window 1M tokens 1M tokens Training Tokens Up to 33 trillion Up to 33 trillion Input Price (per 1M tokens) $1.74 $0.14 Output Price (per 1M tokens) $3.48 $0.28 GDPval-AA Elo 1,554 Not yet reported

How the New Architecture Cuts Compute by 73–90%

The key innovation is a hybrid attention architecture that combines token compression with DeepSeek's sparse attention. According to the technical report:

Image description

V4-Pro needs just 27% of the FLOPs and 10% of the KV cache compared to V3.2 when processing a one-million-token context.
V4-Flash pushes those numbers even lower — down to 10% of the FLOPs and 7% of the KV cache.

This is a direct response to the rising cost of long-context inference, which has become a pain point for agentic AI workloads. Competitors like OpenAI, Google, and Anthropic have recently raised prices and capped usage on their long-context models. DeepSeek's architecture makes long contexts cheap to serve.

The models were trained on up to 33 trillion tokens and refined through distillation from in-house specialist models. They run on both Nvidia GPUs and Huawei's Ascend chips, a detail that signals DeepSeek's continued investment in hardware diversity.

Pricing: 10x Cheaper Than GPT-5.5

V4-Flash costs just $0.14 per million input tokens and $0.28 per million output tokens, making it cheaper than OpenAI's GPT-5.4 Nano. V4-Pro comes in at $1.74 and $3.48, significantly undercutting Gemini 3.1 Pro, GPT-5.5, and Claude Sonnet 4.6. For context, GPT-5.5 is priced at roughly $15 per million input tokens — over 10x more than V4-Pro.

Benchmark Performance: Close but Not Leading

On Artificial Analysis's GDPval-AA benchmark, V4-Pro leads all open-weights models with 1,554 Elo points, ahead of GLM-5.1 (1,535) and Kimi K2.6 (1,484). That's a jump of roughly 355 Elo points over V3.2.

However, DeepSeek acknowledges in the paper that V4-Pro "falls slightly behind GPT-5.4 and Gemini-3.1-Pro" and trails frontier models by about three to six months. Full testing by Artificial Analysis is still underway. OpenAI and Anthropic have since released new models with GPT-5.5 and Opus 4.7.

What This Means in Practice

For developers building agentic AI applications — where long context windows and frequent API calls drive up costs — DeepSeek V4-Flash at $0.14/M input tokens is a game-changer for price-sensitive workloads. The 1M-token context window, combined with the compute-efficient architecture, makes it viable for tasks like processing entire codebases or long document analysis that would be prohibitively expensive on GPT-5.5.

However, for tasks requiring frontier-level reasoning or creative generation, the gap to GPT-5.5 and Opus 4.7 remains. Teams should benchmark on their specific use cases before committing.

Technical Details

Both models are released as open weights under the MIT license, available on Hugging Face. The technical paper details training on up to 33 trillion tokens, distillation from in-house specialist models, and hardware support for both Nvidia GPUs and Huawei's Ascend chips. The hybrid attention architecture is the core innovation, combining token compression with sparse attention to dramatically reduce compute for long contexts.

gentic.news Analysis

What's notable is the timing. Competitors have been raising prices and capping usage on their long-context models, citing the high cost of agentic AI workloads. DeepSeek's hybrid attention architecture directly addresses this pain point, making long contexts cheap to serve. This is a classic market disruption play: while incumbents optimize for margin on premium products, DeepSeek optimizes for cost at scale.

The hardware diversity angle is also worth watching. DeepSeek's support for Huawei's Ascend chips alongside Nvidia GPUs gives it supply chain resilience that Western labs lack. If US export controls tighten further, DeepSeek could maintain production while competitors scramble for Nvidia allocation.

However, the benchmark gap is real. V4-Pro trails GPT-5.5 and Opus 4.7 by three to six months. For applications where model quality directly impacts revenue — like code generation for enterprise deployment or high-stakes medical diagnosis — the extra cost of frontier models may still be justified. DeepSeek's play is for the long tail of use cases where "good enough" at 10x lower cost wins.

This aligns with a broader trend we've covered: the commoditization of foundation models. As open-weight models close the gap to frontier labs, the competitive moat shifts from model quality to distribution, infrastructure, and ecosystem. DeepSeek's MIT license and aggressive pricing are designed to capture developer mindshare, similar to how Meta's Llama strategy played out in 2023–2024.

Frequently Asked Questions

How does DeepSeek V4-Pro compare to GPT-5.5?

DeepSeek V4-Pro trails GPT-5.5 by about three to six months on benchmark performance, according to the company's own technical report. On GDPval-AA, V4-Pro scores 1,554 Elo points, leading all open-weight models but falling slightly behind GPT-5.4 and Gemini-3.1-Pro. OpenAI has since released GPT-5.5, widening the gap.

What makes DeepSeek V4's architecture different from V3?

V4 introduces a hybrid attention architecture that combines token compression with sparse attention. This reduces FLOPs by 73% for V4-Pro and 90% for V4-Flash when processing a one-million-token context, compared to V3.2. The KV cache requirements also drop by 90% and 93% respectively. This is the first new architecture from DeepSeek since V3.

How much does DeepSeek V4 cost compared to competitors?

V4-Flash costs $0.14 per million input tokens and $0.28 per million output tokens — cheaper than OpenAI's GPT-5.4 Nano. V4-Pro costs $1.74 and $3.48, roughly 10x cheaper than GPT-5.5 ($15/M input tokens) and significantly undercutting Gemini 3.1 Pro and Claude Sonnet 4.6.

Is DeepSeek V4 open-source?

Yes, both V4-Pro and V4-Flash are released as open weights under the MIT license. They are available on Hugging Face. The technical paper detailing architecture, training data, and hardware is also publicly available.

Sources cited in this article

Than GPT-5.5 V4-Flash

Source: gentic.news · Apr 24, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

DeepSeek's V4 launch is the latest chapter in a pricing war that has defined the 2025–2026 AI landscape. A year ago, DeepSeek rattled Silicon Valley with V3 and R1, forcing OpenAI and Google to cut prices. Now, with V4, DeepSeek is doubling down on the same strategy: open-weight models that are 'good enough' for most tasks at a fraction of the cost. What's notable is the timing. Competitors have been raising prices and capping usage on their long-context models, citing the high cost of agentic AI workloads. DeepSeek's hybrid attention architecture directly addresses this pain point, making long contexts cheap to serve. This is a classic market disruption play: while incumbents optimize for margin on premium products, DeepSeek optimizes for cost at scale. The hardware diversity angle is also worth watching. DeepSeek's support for Huawei's Ascend chips alongside Nvidia GPUs gives it supply chain resilience that Western labs lack. If US export controls tighten further, DeepSeek could maintain production while competitors scramble for Nvidia allocation. However, the benchmark gap is real. V4-Pro trails GPT-5.5 and Opus 4.7 by three to six months. For applications where model quality directly impacts revenue — like code generation for enterprise deployment or high-stakes medical diagnosis — the extra cost of frontier models may still be justified. DeepSeek's play is for the long tail of use cases where 'good enough' at 10x lower cost wins. This aligns with a broader trend we've covered: the commoditization of foundation models. As open-weight models close the gap to frontier labs, the competitive moat shifts from model quality to distribution, infrastructure, and ecosystem. DeepSeek's MIT license and aggressive pricing are designed to capture developer mindshare, similar to how Meta's Llama strategy played out in 2023–2024.

#foundation models #agentic ai #open source ai #deepseek #ai pricing

Compare side-by-side

Anthropic vs DeepSeek

→

Mentioned in this article

DeepSeek DeepSeek V4-Pro DeepSeek V4-Flash OpenAI Anthropic Google Hugging Face DeepSeek-V3 GLM-5.1 Kimi K2.6

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

AI Research3 shared topics

OpenAI Open-Sources Datacenter Networking Tech

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Big Tech

View all

Three Chinese AI company logos—Alibaba, ByteDance, Zhipu AI—alongside US and French tech logos arranged on a digital…

Big Tech

Time's First AI A-List: Alibaba, ByteDance, Zhipu AI Make Cut

Time magazine named Alibaba, ByteDance, and Zhipu AI among its first AI-specific top 10 list, alongside six US companies and France's Mistral AI. The recognition highlights China's growing global influence through open-source models and consumer AI apps.

scmp.com/Apr 29, 2026/3 min read

time magazinealibabachina ai

A Tencent employee demonstrates the HY3 AI model interface on a large screen at a tech conference, showing Yuanbao…

Big Tech

100

Tencent's HY3 AI Model Has 295B Params, Led by Ex-OpenAI Researcher

Tencent unveiled its HY3 preview model, its most powerful yet with 295 billion parameters. It's already deployed in consumer app Yuanbao and coding assistant CodeBuddy.

scmp.com/Apr 23, 2026/3 min read/Widely Reported

model releaseleadershipbusiness ai

Two overlapping digital brains, one labeled with DNA helix and molecule structures, the other with firewall and…

Big Tech

OpenAI Launches GPT-Rosalind for Drug Discovery, GPT-5.4-Cyber for Security

OpenAI launched GPT-Rosalind, a life sciences model performing above the 95th percentile of human experts on novel biological data, and GPT-5.4-Cyber, a cybersecurity variant. These releases, alongside a major Agents SDK update, signal a pivot from general AI to specialized, high-stakes enterprise domains.

pub.towardsai.net/Apr 20, 2026/3 min read/Multi-Source

ai safetycybersecuritybiotech

Key Takeaways

DeepSeek V4-Pro and V4-Flash: Open-Weight Giants That Undercut Rivals 10x

Key Numbers

How the New Architecture Cuts Compute by 73–90%

Pricing: 10x Cheaper Than GPT-5.5

Benchmark Performance: Close but Not Leading

What This Means in Practice

Technical Details

gentic.news Analysis

Frequently Asked Questions

How does DeepSeek V4-Pro compare to GPT-5.5?

What makes DeepSeek V4's architecture different from V3?

How much does DeepSeek V4 cost compared to competitors?

Is DeepSeek V4 open-source?

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Persuasion Techniques Boost LLM Compliance from 35% to 51% in PNAS Study

Anthropic Acquires Stainless for ~$300M, Owns MCP Toolchain

Anthropic Nears $30B Raise at $900B Valuation, Tops OpenAI

Anthropic Shows Anyone With a Laptop Can Poison Any Major AI Model

Meta Tests Agentic AI Shopping Assistant on Instagram

OpenAI Open-Sources Datacenter Networking Tech

The framework underneath this story

More in Big Tech

Time's First AI A-List: Alibaba, ByteDance, Zhipu AI Make Cut

Tencent's HY3 AI Model Has 295B Params, Led by Ex-OpenAI Researcher

OpenAI Launches GPT-Rosalind for Drug Discovery, GPT-5.4-Cyber for Security