DeepSeek V4 just dropped: 3 things you should know about it

Vyom Ramani

DeepSeek is back. The Chinese AI firm that rattled the entire industry in January 2025 with its shockingly efficient R1 model has released V4, its most ambitious model since. It comes in two versions – V4-Pro, built for heavy coding and agentic tasks, and V4-Flash, a lighter, faster variant – and both are open source. But beyond the headline numbers, there are three things worth paying attention to.

Value just got better

From the start, DeepSeek has marketed itself as the value option to the American giants, and this is true once again with the V4. The price of V4-Pro is $1.74 for a million input tokens and $3.48 for a million output tokens, which is very cheap compared to what OpenAI or Anthropic charge for a similar performance level. In addition, V4-Flash is $0.14 and $0.28 per million tokens of input and ouput, respectively, making it perhaps one of the most budget-friendly best-in-class options on the market.

On benchmarks, DeepSeek claims V4-Pro matches Anthropic’s Claude Opus 4.6, OpenAI’s GPT-5.4, and Google’s Gemini 3.1 – while outperforming every other open-source model on coding, math, and STEM. An internal survey of 85 developers found more than 90% ranked V4-Pro among their top choices for coding tasks. That last number is self-reported and should be taken with a grain of salt, but the pricing alone gives developers a real reason to experiment.

Efficient context window

Both V4 versions support a one-million-token context window, large enough to hold the entire Lord of the Rings trilogy. That alone isn’t novel, pun intended, Gemini and Claude already operate at similar scales. What’s sets DeepSeek apart is how it got there.

Rather than treating every part of a long prompt as equally important, V4 compresses older context and focuses compute on what’s most likely to matter right now. The result is dramatic: in a one-million-token context, V4-Pro uses just 27% of the compute and 10% of the memory that its predecessor V3.2 required. V4-Flash is even leaner, at 10% of the compute and 7% of the memory. For anyone building tools that need to reason across an entire codebase or a large document archive, that efficiency gap is meaningful as it directly translates to lower costs and faster responses at scale.

No more NVIDIA?

This is the most geopolitically loaded aspect of V4, and probably the least understood. For the first time, DeepSeek has optimised a model for domestic Chinese chips – specifically Huawei’s Ascend series – and notably denied early access to Nvidia and AMD ahead of launch. Huawei has confirmed its Ascend 950 supernodes will support V4, and DeepSeek says prices could fall further once those chips ship at scale later this year.

But the move is partial. DeepSeek appears to have adapted V4 for Chinese chips primarily for inference, when users query the model, while training may still rely heavily on Nvidia hardware. Researchers note that Chinese chips remain behind Nvidia’s on raw training performance. What V4 represents, then, isn’t a clean break from American silicon. It’s a first, deliberate step toward building a parallel AI infrastructure, one that Beijing has been pushing for, and that the industry will be watching closely.

V4 probably won’t shake the world the way R1 did. That moment was a surprise. This one was anticipated. But across pricing, architecture, and chip strategy, DeepSeek is making moves that matter and developers, investors, and policymakers would be unwise to scroll past.

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.

Vyom Ramani

27-Apr-2026

DeepSeek V4 just dropped: 3 things you should know about it

Value just got better

Efficient context window

No more NVIDIA?

Latest Article