DeepSeek V4 Is Out: Here Is Everything You Need to Know
- AI News
- 5 min read
- Published: April 24, 2026
- Harish Prajapat
DeepSeek dropped V4 on April 24, 2026, and the AI community has been picking through every benchmark since. The release comes with two model variants, a 1 million token context window baked in as the new default, and technical claims that put it squarely in competition with the best proprietary models on the market right now.
Here is a straight breakdown of what DeepSeek V4 actually is, what it can do, how it performs, and how it compares to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro.
Two Models, Two Use Cases
DeepSeek V4 ships as two separate variants built for different needs.
DeepSeek V4-Pro is the flagship. It carries 1.6 trillion total parameters but runs on only 49 billion active parameters at inference time. That combination is possible because of the Mixture of Experts (MoE) architecture, where the model routes each input through a relevant subset of its parameters rather than activating the entire network every time. The result is frontier-level reasoning at a fraction of what you would expect to pay for a model of this caliber.
DeepSeek V4-Flash is the lighter variant, with 284 billion total parameters and 13 billion active. It is built for speed and cost efficiency. According to DeepSeek’s release notes, Flash’s reasoning capabilities closely approach V4-Pro for most tasks, and it performs on par with the Pro model on simple agentic workflows. For developers who need fast API responses at scale, Flash is the practical choice.
Both models ship with the full 1 million token context window enabled by default across all DeepSeek services.
What Is New in V4
DeepSeek V4 is a significant architectural step beyond V3. The headline upgrade is a new hybrid attention mechanism that combines two approaches: Compressed Sparse Attention and Heavily Compressed Attention (which DeepSeek calls DSA). Together, these reduce the compute and memory burden of processing long contexts dramatically.
The numbers are striking. For a 1 million token context, V4-Pro requires only 27% of the single-token inference FLOPs of its predecessor and just 10% of the KV cache. That matters in practice because long-context inference is one of the most expensive operations in production AI deployments. Getting the same capability at a fraction of the cost makes 1M context viable for a lot more teams.
The model also runs on FP4 + FP8 mixed precision. Experts use FP4 while other parameters run on FP8. This lowers memory requirements further without a meaningful quality drop.
On the capabilities side, V4 adds enhanced agentic features and expands its support for both the OpenAI ChatCompletions API format and the Anthropic API format. Teams already using either standard can drop V4 in without rewriting their integration code.
Benchmark Results
DeepSeek published benchmark numbers alongside the release. Here are the verified scores for V4-Pro:
| Benchmark | DeepSeek V4-Pro Score |
|---|---|
| MMLU (5-shot) | 90.1% |
| HumanEval (0-shot) | 76.8% |
| LiveCodeBench (Max mode) | 93.5% |
| CorpusQA 1M long-context | 62.0% accuracy |
The LiveCodeBench number in Max thinking mode is the one people are talking about most. 93.5% on coding benchmarks puts V4-Pro at or above what several closed-source models have posted publicly. DeepSeek’s own description: “open-source state-of-the-art in agentic coding benchmarks.”
For world knowledge tasks, DeepSeek acknowledges V4-Pro leads all current open models but sits behind Gemini-3.1-Pro. For math, STEM, and coding, the claim is that V4-Pro beats every open model currently available and rivals the top closed-source alternatives.
Pricing
DeepSeek V4-Pro API pricing through the official platform:
| Usage type | Price |
|---|---|
| Input tokens | $1.74 per million |
| Output tokens | $3.48 per million |
| Cached input | $0.145 per million |
For reference, that is a fraction of what comparable proprietary models charge per token. The cached pricing at $0.145 per million tokens is particularly notable for production applications that reuse context repeatedly, like RAG pipelines and multi-turn agent systems.
DeepSeek V4-Flash pricing is lower, though the exact numbers were not listed in the launch documentation.
Thinking Modes
V4 introduces three reasoning mode options, which gives developers more control over the quality-cost tradeoff on a per-request basis.
Non-Think mode disables extended reasoning and returns a direct response. Best for simple queries, fast retrieval tasks, and applications where latency matters more than depth.
Think High activates intermediate-level reasoning. Suitable for standard coding tasks, analysis, and structured writing.
Think Max unlocks full chain-of-thought reasoning. This is the mode that produced the 93.5% LiveCodeBench score. It costs more and runs slower, but for complex multi-step problems it is the most capable setting the model offers.
DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro
This is the comparison most people are actually interested in. Here is an honest breakdown based on available public benchmarks and documented capabilities.
| Feature | DeepSeek V4-Pro | GPT-5.5 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Parameters | 1.6T total / 49B active | Undisclosed | Undisclosed | Undisclosed |
| Context window | 1M tokens | 128K tokens | 200K tokens | 1M tokens |
| Open source | Yes | No | No | No |
| Coding (LiveCodeBench) | 93.5% (Max) | Competitive | Strong | Competitive |
| World knowledge | Strong | Very strong | Very strong | Best in class |
| Reasoning modes | 3 modes | Standard | Extended thinking | Standard |
| Input pricing / 1M tokens | $1.74 | ~$15+ | ~$15+ | ~$7+ |
| Agentic coding | SOTA open-source | Strong | Strong | Good |
| API compatibility | OpenAI + Anthropic format | OpenAI format | Anthropic format | Google format |
| Long context accuracy | 62% at 1M (CorpusQA) | Varies | Varies | Strong at 1M |
DeepSeek V4 vs GPT-5.5
GPT-5.5 remains the benchmark for general-purpose intelligence and world knowledge tasks. OpenAI’s model edges ahead on nuanced reasoning, creative writing quality, and instruction following for open-ended tasks. The gap on math and coding is much narrower than it was a year ago, and V4-Pro’s Max thinking mode closes it further.
Where V4-Pro wins clearly: price. At $1.74 per million input tokens versus GPT-5.5’s substantially higher pricing, teams running high-volume API workloads are paying attention. For coding-heavy pipelines and agentic workflows, the cost difference at scale is significant.
DeepSeek V4 vs Claude Opus 4.7
Claude Opus 4.7 is the strongest model in the Claude lineup for complex reasoning, long-form writing, and code generation. Its 200K token context window is solid, though V4-Pro’s 1M window gives it a practical edge for tasks that require processing large codebases, full document sets, or long conversation histories.
Claude’s strength is in nuanced instruction following and writing quality. For tasks that require careful, multi-step reasoning presented in clear language, Opus 4.7 is competitive. For pure coding benchmarks at this point, V4-Pro in Max mode is posting better numbers.
DeepSeek V4 vs Gemini 3.1 Pro
This is the closest comparison. DeepSeek’s own release notes acknowledge Gemini 3.1 Pro leads on world knowledge tasks. Both models support 1M token contexts natively, and both are optimized for long-context efficiency.
Where they split: Gemini 3.1 Pro is Google’s closed-source product with the pricing that comes with it. DeepSeek V4-Pro is open-source, significantly cheaper per token, and available for self-hosting. On coding and math benchmarks, V4-Pro claims the edge over Gemini. On general knowledge and multimodal tasks, Gemini 3.1 Pro holds its own.
Key Dates to Know
DeepSeek also used the V4 launch to set a retirement timeline for older models. The legacy deepseek-chat and deepseek-reasoner identifiers will be fully retired on July 24, 2026 at 15:59 UTC. Developers using those model IDs need to migrate to deepseek-v4-pro or deepseek-v4-flash before that date.
Final Word
DeepSeek V4 is not a quiet update. The combination of open-source availability, 1M token context at production-friendly pricing, and competitive benchmark performance on coding tasks gives it a serious position in the current model landscape.
For teams running agentic pipelines, code generation at scale, or long-document processing, V4-Pro in Think Max mode deserves real evaluation time alongside the closed-source options.
You can read DeepSeek’s full technical announcement directly on their .
Frequently Asked Question
DeepSeek V4 is the latest language model release from DeepSeek, launched April 24, 2026. It comes in two variants: V4-Pro with 1.6 trillion parameters and V4-Flash with 284 billion. Both include a 1M token context window, thinking modes, and API support for both OpenAI and Anthropic formats.
Yes. DeepSeek V4 is open source, which means developers can self-host it or access it through the official API and third-party providers like DeepInfra.
The API pricing is $1.74 per million input tokens, $3.48 per million output tokens, and $0.145 per million cached tokens. This is considerably cheaper than most comparable proprietary models.
Both V4-Pro and V4-Flash include a 1 million token context window as the default. This is among the largest context windows of any model currently available.
V4-Pro is the flagship with 1.6T parameters and higher benchmark scores, particularly in complex reasoning and coding. V4-Flash has 284B parameters and prioritizes speed and lower cost. Flash’s reasoning approaches Pro on most standard tasks.
V4-Pro is competitive on coding and math benchmarks and significantly cheaper per token. GPT-5.5 has an edge on world knowledge, general reasoning, and writing quality. For code-heavy or agentic workflows at volume, V4-Pro is the more cost-effective choice.
The legacy deepseek-chat and deepseek-reasoner model identifiers retire on July 24, 2026. Developers should migrate to deepseek-v4-pro or deepseek-v4-flash before that date.
