DeepSeek-V3: 671B MoE Open-Source Model with 3x Speed Improvement
DeepSeek releases V3, a 671B parameter Mixture-of-Experts model with 37B activated parameters, trained on 14.8T tokens. The model runs at 60 tokens/second (3x faster than V2) and is fully open-source with weights and paper released. API pricing is set at $0.27/M input tokens and $1.10/M output tokens starting February 8, positioning it as a low-cost frontier alternative. DeepSeek signals future multimodal capabilities in the ecosystem.
Related guides (3)
Related events (8)
DeepSeek V4 Preview Release: 1.6T-param Pro and 284B Flash Models with 1M Context, Open-Sourced
DeepSeek has released DeepSeek-V4 as an open-weights preview, comprising two MoE variants: V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters). Both models support 1M token context by default, enabled by a novel Token-wise compression and DeepSeek Sparse Attention (DSA) architecture. V4-Pro claims open-source SOTA on agentic coding benchmarks and world-class math/STEM/coding performance rivaling top closed-source models, while V4-Flash offers near-parity reasoning at lower cost and latency. The API is live today with OpenAI and Anthropic compatibility, and legacy model endpoints will be retired in July 2026.
DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut
DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.
DeepSeek releases DeepSeek-V3.2 on Hugging Face
DeepSeek has released DeepSeek-V3.2, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports fp8 precision, is endpoints-compatible, and has accumulated over 3.6 million downloads and 1,446 likes, indicating significant community uptake. This appears to be a successor to DeepSeek-V3, continuing the lab's competitive open-weights model series.
DeepSeek-V3.1 Release: Hybrid Think/Non-Think Model with Agent-Focused Upgrades
DeepSeek has released V3.1, a hybrid inference model supporting both thinking and non-thinking modes in a single model, positioned as their first step toward the agent era. The model features improved tool use and multi-step agent task performance, with benchmarks showing gains on SWE-bench and Terminal-Bench, and faster thinking efficiency compared to DeepSeek-R1-0528. The base model received 840B tokens of continued pretraining for long-context extension, a new tokenizer, and open-source weights are available on HuggingFace. API updates include 128K context for both modes, Anthropic API format compatibility, and strict function calling support in beta.
DeepSeek releases DeepSeek-V3.1 on Hugging Face
DeepSeek has released DeepSeek-V3.1, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports fp8 precision, text-generation-inference, and endpoint deployment, and has accumulated over 220K downloads and 824 likes shortly after release. This appears to be an updated iteration of the DeepSeek-V3 series, a frontier-class open-weights model family.
DeepSeek-V3-0324 Released with Improved Reasoning, Tool-Use, and MIT License
DeepSeek has released DeepSeek-V3-0324, an updated version of its V3 model featuring major improvements in reasoning performance, front-end development capabilities, and tool-use. The model is now released under the MIT License, matching DeepSeek-R1's open licensing terms. Weights are publicly available on Hugging Face, and the API interface remains unchanged from the prior V3 version.
DeepSeek-V3.2 and V3.2-Speciale Released: Reasoning-First Models with Agent Tool-Use Integration
DeepSeek has released two new open-weights models: DeepSeek-V3.2, the official successor to V3.2-Exp with balanced reasoning and tool-use capabilities, and DeepSeek-V3.2-Speciale, a maxed-out reasoning variant claiming gold-medal performance on IMO, CMO, ICPC World Finals, and IOI 2025. V3.2 is the first DeepSeek model to integrate chain-of-thought thinking directly into tool-use workflows, trained on a new agent data synthesis pipeline covering 1,800+ environments and 85k+ complex instructions. V3.2-Speciale is API-only with no tool-call support, available via a temporary endpoint expiring December 15, 2025, while both models are open-sourced on Hugging Face with an accompanying technical report.
DeepSeek releases DeepSeek-V3.2-Exp on Hugging Face
DeepSeek has published DeepSeek-V3.2-Exp, an experimental text-generation model, on Hugging Face under the deepseek-ai organization. The model uses the deepseek_v32 architecture and supports fp8 precision, with tags indicating eval results and endpoint compatibility. Early traction is notable with nearly 176K downloads and ~1K likes shortly after release.


