DeepSeek to Make Permanent 75% Discount on Flagship AI Model
DeepSeek is permanently reducing pricing on its flagship AI model by 75%, signaling a sustained aggressive pricing strategy rather than a temporary promotional move. This continues the pattern of Chinese AI labs applying significant downward pressure on frontier model API pricing. The move has implications for competitive dynamics across the inference market and may force responses from other major providers.
Related guides (3)
Related events (8)
DeepSeek Makes V4 Pro Price Discount Permanent
DeepSeek has announced that the previously temporary price discount on its V4 Pro model is now permanent. This pricing change is notable in the context of ongoing inference cost competition among frontier model providers. The announcement generated significant community discussion on Hacker News with 234 points and 141 comments.
DeepSeek withholds DeepSeek-V4 pre-release access from Nvidia and AMD, shares with Huawei
DeepSeek has given Huawei several weeks of pre-release access to its upcoming DeepSeek-V4 model for hardware optimization, while denying the same access to Nvidia and AMD — a departure from prior practice. Reuters also reported that an unnamed Trump administration official claims DeepSeek-V4 was trained on Nvidia's most advanced chips despite U.S. export controls, though the sourcing is unverified. The move signals deepening geopolitical fragmentation in AI supply chains and aligns with China's push for domestic chip self-sufficiency. DeepSeek-V4 has not yet been publicly released.
DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut
DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.
One Year Since the "DeepSeek Moment"
A Hugging Face retrospective marking one year since the DeepSeek moment, which shook assumptions about AI development costs and open-weights competitiveness. The piece likely reflects on how DeepSeek's efficient training approach influenced the broader AI landscape, open-weights progress, and inference economics over the past year. Published on the anniversary of the original release, it offers industry analysis from a major open-source AI platform perspective.
The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+
Hugging Face publishes a retrospective and forward-looking commentary marking one year since the 'DeepSeek moment,' examining how DeepSeek's open-weight releases reshaped the global open-source AI ecosystem. The piece analyzes the downstream effects on model development, inference economics, and competitive dynamics between open and closed AI labs. It situates these developments within a broader 'AI+' framing, suggesting a new phase of AI integration across industries.
DeepSeek-V3: 671B MoE Open-Source Model with 3x Speed Improvement
DeepSeek releases V3, a 671B parameter Mixture-of-Experts model with 37B activated parameters, trained on 14.8T tokens. The model runs at 60 tokens/second (3x faster than V2) and is fully open-source with weights and paper released. API pricing is set at $0.27/M input tokens and $1.10/M output tokens starting February 8, positioning it as a low-cost frontier alternative. DeepSeek signals future multimodal capabilities in the ecosystem.
DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%
DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.
Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek
A Hugging Face blog post reflecting on one year since the 'DeepSeek moment' examines the architectural decisions shaping China's open-source AI ecosystem. The piece analyzes how Chinese labs have built upon and diverged from DeepSeek's design choices in the intervening year. It situates these developments within the broader context of open-weights model progress and competitive dynamics between Chinese and Western AI development.


