Almanac
Guide · Beginner

DeepSeek V4: The Open-Weights Giant Reshaping AI Economics

DeepSeek V4Beginneractive·v1 · live·generated 6d ago

Part of these paths

TL;DRDeepSeek V4 is a family of massive open-weights AI models from the Chinese lab DeepSeek, built to rival the world's best closed-source systems at a fraction of the cost. It arrives as the culmination of a rapid model lineage — from V3 through several V3.x iterations — and pushes the frontier on long-context reasoning and agentic coding while permanently slashing its own prices. Its rise has also sparked geopolitical friction, from allegations of large-scale data theft to hardware access decisions that reflect deepening US-China AI tensions.

Key takeaways

  • V4 comes in two open-weights variants: V4-Pro (1.6 trillion total parameters, 49B active) and V4-Flash (284B total, 13B active), both with a 1M-token context window by default.
  • DeepSeek permanently cut V4-Pro prices by 75%, continuing a pattern of aggressive downward pressure on frontier model API costs.
  • V4 uses a novel DeepSeek Sparse Attention (DSA) architecture first tested in V3.2-Exp, enabling efficient long-context handling.
  • Anthropic publicly accused DeepSeek of conducting industrial-scale 'distillation attacks' — generating over 16 million exchanges via ~24,000 fraudulent accounts to copy Claude's capabilities.
  • DeepSeek gave Huawei weeks of pre-release access to V4 for hardware optimization while denying the same to Nvidia and AMD, signaling a deliberate alignment with China's domestic chip ecosystem.
  • One industry analysis noted V4 trails leading open and closed models on aggregate benchmarks, despite strong performance on specific agentic coding tasks.

What DeepSeek V4 is

DeepSeek V4 is a family of open-weights AI models built by DeepSeek, a Chinese AI laboratory. "Open-weights" means the actual model files are freely downloadable — unlike the AI assistants from OpenAI or Anthropic, which you can only access through their companies' services. V4 comes in two sizes: V4-Pro, a massive model with 1.6 trillion total parameters (think of parameters as the knobs the model learned to tune during training), and V4-Flash, a leaner version at 284 billion parameters designed for faster, cheaper responses.

Both models use a design called Mixture of Experts (MoE): instead of running all those parameters on every query, only a small active slice is used at a time — 49 billion for Pro, 13 billion for Flash. This is what lets a model with a staggering total size still be practical to run.

Why it matters

DeepSeek V4 matters for two big reasons: capability and cost.

On capability, V4 supports a 1 million token context window by default — meaning it can read and reason over roughly 750,000 words of text in a single session. That's enough to hold an entire codebase, a long legal document, or hours of conversation history. This is powered by a new technique called DeepSeek Sparse Attention (DSA), which makes processing very long inputs more efficient.

On cost, DeepSeek permanently cut V4-Pro's API price by 75%, continuing a pattern the lab has established across its model generations. When a frontier-class model slashes its price permanently, it puts pressure on every other provider in the market.

How it got here: the V3 lineage

V4 didn't appear out of nowhere. DeepSeek built up to it through a rapid series of releases:

  • DeepSeek V3 launched as a 671-billion-parameter open-source model running at 60 tokens per second — three times faster than its predecessor — at very low API prices.
  • DeepSeek R1 followed as a reasoning-focused model claiming performance on par with OpenAI's o1 on math and coding benchmarks, released under the permissive MIT License.
  • V3.1 added hybrid "thinking and non-thinking" modes in a single model, along with improved tool use for multi-step tasks.
  • V3.2-Exp introduced the sparse attention architecture that V4 would later build on, alongside a 50%+ API price cut.
  • V3.2 integrated chain-of-thought reasoning directly into tool-use workflows, trained on a new pipeline covering over 1,800 environments.

V4 is the synthesis of all these experiments into a single, larger, more capable release.

The controversy: distillation and geopolitics

DeepSeek V4's rise has not been without friction.

Distillation allegations: Anthropic publicly accused DeepSeek (along with Moonshot AI and MiniMax) of running coordinated "distillation attacks" — generating over 16 million exchanges through approximately 24,000 fraudulent accounts to extract responses from Claude and use them to train DeepSeek's own models. Anthropic framed this as both a violation of its terms of service and a national security concern, arguing that models trained this way inherit capabilities without the safety guardrails. A separate report described a broader gray-market ecosystem of API proxy networks enabling this kind of data harvesting at scale.

Hardware access: Before V4's public release, DeepSeek gave Huawei several weeks of early access for hardware optimization — while denying the same to Nvidia and AMD. This was a notable departure from prior practice and signals DeepSeek's deliberate alignment with China's domestic chip ecosystem amid ongoing US export controls.

Benchmark reality check: Despite strong claims on agentic coding tasks, at least one industry analysis noted that V4 trails leading open and closed models on aggregate benchmarks — a reminder that headline numbers don't always tell the full story.

Where it fits in the broader landscape

V4 sits alongside a wave of competitive open-weights releases from Chinese and international labs — Qwen3, Kimi K2.6, and others — all pushing toward frontier capability at lower cost. Nvidia has framed its own open-weights investments partly as a strategic response to Chinese labs building capable models on non-Nvidia hardware. The US government's NIST TRAINS task force, meanwhile, is moving toward pre-deployment national security evaluations of frontier models — a regulatory environment that will increasingly shape what models like V4 can be used for in certain contexts.

For practitioners and businesses, DeepSeek V4 represents a genuine option: frontier-adjacent capability, fully downloadable, with aggressive pricing and broad API compatibility (including OpenAI and Anthropic API formats). The geopolitical and safety questions are real, but so is the model.

The DeepSeek V-series lineage leading to V4

DeepSeek V4 variants at a glance

ModelTotal ParamsActive ParamsContext WindowNotable
DeepSeek V4-Pro1.6T49B1M tokensSOTA claim on agentic coding; 75% permanent price cut
DeepSeek V4-Flash284B13B1M tokensNear-parity reasoning at lower cost and latency
DeepSeek V3671B37BPredecessor; $0.27/$1.10 per M tokens at launch
DeepSeek V3.2128KAdded chain-of-thought tool-use; agent data pipeline

Parameters and context from the events bundle; unknown cells render —.

Timeline

  1. DeepSeek V4-Pro, V4-Flash, and base models released on Hugging Face

  2. Industry analysis notes V4 trails leading models on aggregate benchmarks

  3. DeepSeek makes V4-Pro 75% price discount permanent

  4. Anthropic accuses DeepSeek of industrial-scale distillation attacks via ~24,000 fraudulent accounts

Related topics

Hugging FaceAnthropicOpenAINVIDIAMoonshot AIQwen3DeepSeek-R1-0528

FAQ

What makes DeepSeek V4 different from other big AI models?

It's fully open-weights — anyone can download and run it — and it uses a 'Mixture of Experts' design that activates only a fraction of its parameters at once, making it far cheaper to run than its total size suggests. It also ships with a 1M-token context window by default, which most models don't offer.

What does 'open-weights' mean, and why does it matter?

Open-weights means DeepSeek publishes the actual model files so anyone can download, run, or modify them — unlike ChatGPT or Claude, which you can only access through a company's API. This lets researchers, businesses, and developers use the model without paying per query or being subject to usage restrictions.

What is a Mixture of Experts model?

Think of it like a large team of specialists: instead of every expert weighing in on every question, only the relevant few are activated for each task. V4-Pro has 1.6 trillion parameters total but only uses about 49 billion at a time, keeping costs and speed practical despite the enormous overall size.

Why is DeepSeek controversial?

Anthropic publicly accused DeepSeek of running large-scale 'distillation attacks' — using tens of thousands of fake accounts to extract Claude's responses and train DeepSeek models on them, which Anthropic frames as both a terms-of-service violation and a national security concern. DeepSeek also gave Huawei early access to V4 while blocking Nvidia and AMD, reflecting US-China chip tensions.

Is DeepSeek V4 the best open-weights model available?

It's highly competitive, especially on agentic coding tasks, but at least one industry analysis noted it trails leading open and closed models on aggregate benchmarks — so 'best' depends heavily on the specific task.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

  • v1live6d ago

Related guides (4)

More on DeepSeek V4 (6)

6Hugging Face Blog·1mo ago·source ↗

DeepSeek-V4: a million-token context that agents can actually use

A Hugging Face blog post discusses DeepSeek-V4, highlighting its million-token context window as a practically usable capability for agentic applications. The post appears to analyze or announce DeepSeek-V4's long-context features in the context of agent workflows. No article body was available for deeper analysis.

6Deepseek News·1mo ago·source ↗

DeepSeek API Major Upgrade: Function Calling, FIM, Chat Prefix Completion, JSON Output, and 8K Token Limit

DeepSeek has released a significant API update adding Function Calling (up to 128 parallel calls, OpenAI-compatible), JSON Output, Chat Prefix Completion, and FIM (Fill-In-the-Middle) Completion to both deepseek-chat and deepseek-coder models. The update also raises the max_tokens ceiling to 8K in the Beta API. Several features are in Beta and will be open-sourced once stable. The Function Calling and JSON Output implementations are explicitly designed to be compatible with the OpenAI API.

7Deepseek News·1mo ago·source ↗

DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%

DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.

6Deepseek News·1mo ago·source ↗

DeepSeek-V2.5: Merged Open-Source Model Combining General and Coding Capabilities

DeepSeek has released DeepSeek-V2.5, an open-source model that merges DeepSeek-V2-Chat-0628 and DeepSeek-Coder-V2-0724 into a single unified model. The release improves general conversational capabilities, coding performance, instruction-following, and writing tasks while also strengthening safety properties—raising the overall safety score from 74.4% to 82.6% and reducing safety spillover rate from 11.3% to 4.6%. The model is available via backward-compatible API endpoints (deepseek-chat and deepseek-coder) and on HuggingFace, retaining features like Function Calling, FIM completion, and JSON output. Benchmark results show improvements on HumanEval Python and LiveCodeBench, though SWE-verified performance remains an acknowledged weak area.

7Deepseek News·1mo ago·source ↗

DeepSeek-R1-Lite-Preview Launched with o1-Level Reasoning Performance

DeepSeek has released DeepSeek-R1-Lite-Preview, a reasoning-focused model claiming o1-preview-level performance on AIME and MATH benchmarks. The model features a transparent, real-time chain-of-thought process and demonstrates inference scaling behavior where longer reasoning chains yield better results. DeepSeek has indicated that open-source model weights and a full API are forthcoming. The model is currently accessible via chat.deepseek.com.

9Deepseek News·1mo ago·source ↗

DeepSeek-V3: 671B MoE Open-Source Model with 3x Speed Improvement

DeepSeek releases V3, a 671B parameter Mixture-of-Experts model with 37B activated parameters, trained on 14.8T tokens. The model runs at 60 tokens/second (3x faster than V2) and is fully open-source with weights and paper released. API pricing is set at $0.27/M input tokens and $1.10/M output tokens starting February 8, positioning it as a low-cost frontier alternative. DeepSeek signals future multimodal capabilities in the ecosystem.