
DeepSeek V4
deepseek-v4-a30ff1de·83 events·first seen 1mo agoAliases: DeepSeek V4, DeepSeek, DeepSeek-AI, DeepSeek-V4, DeepSeek-R1, DeepSeek R1, DeepSeek V3, DeepSeek V2, DeepSeek-V2.5, DeepSeek V3.2, DeepSeek-V3, DeepSeek-V2, DeepSeek-V3.1, DeepSeek-V3.2-Exp, DeepSeek-V3.2, DeepSeek V4 Pro, DeepSeek V3 Pro, DeepSeek-V4-Pro, DeepSeek v3, DeepSeek-V4-Pro-Base, DeepSeek-VL-v2, DeepSeek-V3.2-Exp-Base
Co-occurring entities
More like this (12)
Guides (1)
Recent events (50)
DeepSeek-V4: a million-token context that agents can actually use
A Hugging Face blog post discusses DeepSeek-V4, highlighting its million-token context window as a practically usable capability for agentic applications. The post appears to analyze or announce DeepSeek-V4's long-context features in the context of agent workflows. No article body was available for deeper analysis.
DeepSeek API Major Upgrade: Function Calling, FIM, Chat Prefix Completion, JSON Output, and 8K Token Limit
DeepSeek has released a significant API update adding Function Calling (up to 128 parallel calls, OpenAI-compatible), JSON Output, Chat Prefix Completion, and FIM (Fill-In-the-Middle) Completion to both deepseek-chat and deepseek-coder models. The update also raises the max_tokens ceiling to 8K in the Beta API. Several features are in Beta and will be open-sourced once stable. The Function Calling and JSON Output implementations are explicitly designed to be compatible with the OpenAI API.
DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%
DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.
DeepSeek-V2.5: Merged Open-Source Model Combining General and Coding Capabilities
DeepSeek has released DeepSeek-V2.5, an open-source model that merges DeepSeek-V2-Chat-0628 and DeepSeek-Coder-V2-0724 into a single unified model. The release improves general conversational capabilities, coding performance, instruction-following, and writing tasks while also strengthening safety properties—raising the overall safety score from 74.4% to 82.6% and reducing safety spillover rate from 11.3% to 4.6%. The model is available via backward-compatible API endpoints (deepseek-chat and deepseek-coder) and on HuggingFace, retaining features like Function Calling, FIM completion, and JSON output. Benchmark results show improvements on HumanEval Python and LiveCodeBench, though SWE-verified performance remains an acknowledged weak area.
DeepSeek-R1-Lite-Preview Launched with o1-Level Reasoning Performance
DeepSeek has released DeepSeek-R1-Lite-Preview, a reasoning-focused model claiming o1-preview-level performance on AIME and MATH benchmarks. The model features a transparent, real-time chain-of-thought process and demonstrates inference scaling behavior where longer reasoning chains yield better results. DeepSeek has indicated that open-source model weights and a full API are forthcoming. The model is currently accessible via chat.deepseek.com.
DeepSeek-V3: 671B MoE Open-Source Model with 3x Speed Improvement
DeepSeek releases V3, a 671B parameter Mixture-of-Experts model with 37B activated parameters, trained on 14.8T tokens. The model runs at 60 tokens/second (3x faster than V2) and is fully open-source with weights and paper released. API pricing is set at $0.27/M input tokens and $1.10/M output tokens starting February 8, positioning it as a low-cost frontier alternative. DeepSeek signals future multimodal capabilities in the ecosystem.
DeepSeek-R1 Release: Open-Source Reasoning Model on Par with OpenAI o1
DeepSeek has released DeepSeek-R1, a reasoning-focused large language model claiming performance parity with OpenAI o1 on math, code, and reasoning benchmarks. The model is fully open-source under the MIT License, including weights and outputs, enabling distillation and commercial use. Six distilled smaller models (up to 32B and 70B) are also released, with the 32B and 70B variants reportedly matching OpenAI o1-mini. API access is live at significantly lower pricing than comparable frontier models ($0.55/M input tokens, $2.19/M output tokens).
DeepSeek-V3-0324 Released with Improved Reasoning, Tool-Use, and MIT License
DeepSeek has released DeepSeek-V3-0324, an updated version of its V3 model featuring major improvements in reasoning performance, front-end development capabilities, and tool-use. The model is now released under the MIT License, matching DeepSeek-R1's open licensing terms. Weights are publicly available on Hugging Face, and the API interface remains unchanged from the prior V3 version.
DeepSeek-R1-0528 Released with Improved Benchmarks, Reduced Hallucinations, and Function Calling
DeepSeek has released DeepSeek-R1-0528, an updated version of its R1 reasoning model featuring improved benchmark performance, reduced hallucinations, enhanced front-end capabilities, and new support for JSON output and function calling. The API interface remains unchanged, and open-source weights are available on Hugging Face. This is an incremental update to the R1 series rather than a new flagship model.
DeepSeek-V3.1 Release: Hybrid Think/Non-Think Model with Agent-Focused Upgrades
DeepSeek has released V3.1, a hybrid inference model supporting both thinking and non-thinking modes in a single model, positioned as their first step toward the agent era. The model features improved tool use and multi-step agent task performance, with benchmarks showing gains on SWE-bench and Terminal-Bench, and faster thinking efficiency compared to DeepSeek-R1-0528. The base model received 840B tokens of continued pretraining for long-context extension, a new tokenizer, and open-source weights are available on HuggingFace. API updates include 128K context for both modes, Anthropic API format compatibility, and strict function calling support in beta.
DeepSeek releases V3.1-Terminus, an incremental update to V3.1 with agent and language consistency improvements
DeepSeek has released DeepSeek-V3.1-Terminus, an update to its V3.1 model addressing user feedback on language mixing issues and improving Code Agent and Search Agent performance. The release claims more stable and reliable benchmark outputs compared to V3.1. Weights are publicly available on Hugging Face, and the model is accessible via the DeepSeek app, web, and API.
DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut
DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.
DeepSeek-V3.2 and V3.2-Speciale Released: Reasoning-First Models with Agent Tool-Use Integration
DeepSeek has released two new open-weights models: DeepSeek-V3.2, the official successor to V3.2-Exp with balanced reasoning and tool-use capabilities, and DeepSeek-V3.2-Speciale, a maxed-out reasoning variant claiming gold-medal performance on IMO, CMO, ICPC World Finals, and IOI 2025. V3.2 is the first DeepSeek model to integrate chain-of-thought thinking directly into tool-use workflows, trained on a new agent data synthesis pipeline covering 1,800+ environments and 85k+ complex instructions. V3.2-Speciale is API-only with no tool-call support, available via a temporary endpoint expiring December 15, 2025, while both models are open-sourced on Hugging Face with an accompanying technical report.
DeepSeek V4 Preview Release: 1.6T-param Pro and 284B Flash Models with 1M Context, Open-Sourced
DeepSeek has released DeepSeek-V4 as an open-weights preview, comprising two MoE variants: V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters). Both models support 1M token context by default, enabled by a novel Token-wise compression and DeepSeek Sparse Attention (DSA) architecture. V4-Pro claims open-source SOTA on agentic coding benchmarks and world-class math/STEM/coding performance rivaling top closed-source models, while V4-Flash offers near-parity reasoning at lower cost and latency. The API is live today with OpenAI and Anthropic compatibility, and legacy model endpoints will be retired in July 2026.
DeepSeek Makes V4 Pro Price Discount Permanent
DeepSeek has announced that the previously temporary price discount on its V4 Pro model is now permanent. This pricing change is notable in the context of ongoing inference cost competition among frontier model providers. The announcement generated significant community discussion on Hacker News with 234 points and 141 comments.
DeepSeek to Make Permanent 75% Discount on Flagship AI Model
DeepSeek is permanently reducing pricing on its flagship AI model by 75%, signaling a sustained aggressive pricing strategy rather than a temporary promotional move. This continues the pattern of Chinese AI labs applying significant downward pressure on frontier model API pricing. The move has implications for competitive dynamics across the inference market and may force responses from other major providers.
DeepSeek withholds DeepSeek-V4 pre-release access from Nvidia and AMD, shares with Huawei
DeepSeek has given Huawei several weeks of pre-release access to its upcoming DeepSeek-V4 model for hardware optimization, while denying the same access to Nvidia and AMD — a departure from prior practice. Reuters also reported that an unnamed Trump administration official claims DeepSeek-V4 was trained on Nvidia's most advanced chips despite U.S. export controls, though the sourcing is unverified. The move signals deepening geopolitical fragmentation in AI supply chains and aligns with China's push for domestic chip self-sufficiency. DeepSeek-V4 has not yet been publicly released.
DeepSeek releases DeepSeek-V4-Pro on Hugging Face
DeepSeek has released DeepSeek-V4-Pro, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports FP8 and 8-bit quantization formats and is tagged as endpoints-compatible with eval results included. With over 4.3 million downloads and 4,740 likes, it has attracted significant community uptake.
DeepSeek releases DeepSeek-V3.2 on Hugging Face
DeepSeek has released DeepSeek-V3.2, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports fp8 precision, is endpoints-compatible, and has accumulated over 3.6 million downloads and 1,446 likes, indicating significant community uptake. This appears to be a successor to DeepSeek-V3, continuing the lab's competitive open-weights model series.
DeepSeek releases DeepSeek-V4-Pro-Base on Hugging Face
DeepSeek has released DeepSeek-V4-Pro-Base, a new base model, on Hugging Face with fp8 and safetensors support. The model has accumulated over 20,000 downloads and 291 likes shortly after release. This represents a new generation in DeepSeek's V-series open-weights frontier models.
DeepSeek releases DeepSeek-V4-Flash on Hugging Face
DeepSeek has released DeepSeek-V4-Flash, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports FP8 and 8-bit quantization and is tagged as conversational and endpoints-compatible. With over 2.8 million downloads and 1,455 likes, it has seen substantial early uptake.
DeepSeek releases DeepSeek-V3.2-Exp on Hugging Face
DeepSeek has published DeepSeek-V3.2-Exp, an experimental text-generation model, on Hugging Face under the deepseek-ai organization. The model uses the deepseek_v32 architecture and supports fp8 precision, with tags indicating eval results and endpoint compatibility. Early traction is notable with nearly 176K downloads and ~1K likes shortly after release.
DeepSeek releases DeepSeek-V3.1-Terminus on Hugging Face
DeepSeek has published DeepSeek-V3.1-Terminus, a new text-generation model, on Hugging Face under the deepseek_v3 architecture family. The model supports FP8 precision, safetensors format, and is compatible with text-generation-inference endpoints. Early traction is visible with over 11,500 downloads and 365 likes shortly after release.
DeepSeek releases DeepSeek-V3.1 on Hugging Face
DeepSeek has released DeepSeek-V3.1, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports fp8 precision, text-generation-inference, and endpoint deployment, and has accumulated over 220K downloads and 824 likes shortly after release. This appears to be an updated iteration of the DeepSeek-V3 series, a frontier-class open-weights model family.
DeepSeek releases DeepSeek-V3.1-Base on Hugging Face
DeepSeek has released DeepSeek-V3.1-Base, a new base model for text generation, on Hugging Face. The model supports fp8 precision, safetensors format, and is compatible with text-generation-inference endpoints. With over 1,000 likes and nearly 9,000 downloads shortly after release, it is attracting significant community attention as a successor to the widely-used DeepSeek-V3.
Hugging Face open reproduction of DeepSeek-R1
Hugging Face has published an open reproduction of DeepSeek-R1, the reasoning-focused language model, on GitHub. The project aims to replicate DeepSeek-R1's training methodology and capabilities in an open-weights setting. This contributes to the broader effort to make frontier reasoning model techniques accessible to the research community.
DeepSeek V2.5-1210: Final Update to V2.5 Series, V3 Generation Teased
DeepSeek has released DeepSeek-V2.5-1210, the final update to its V2.5 model series, with claimed improvements across math, coding, writing, and roleplay benchmarks. The model is available as open weights on Hugging Face. DeepSeek also announced the launch of Internet Search on chat.deepseek.com. The release marks the end of the V2 generation, with the company signaling work on next-generation foundation models.
Open-R1: Update #1 — Open Reproduction of DeepSeek-R1
Hugging Face's Open-R1 project provides a first progress update on its open reproduction of DeepSeek-R1, a reasoning-focused language model. The update covers early training runs, dataset construction, and evaluation results aimed at replicating DeepSeek-R1's chain-of-thought reasoning capabilities. This effort is part of the broader open-weights community push to reproduce frontier reasoning models transparently.
Mini-R1: Reproducing DeepSeek R1 'Aha Moment' — An RL Tutorial
A Hugging Face blog post demonstrates how to reproduce DeepSeek R1's emergent 'aha moment' reasoning behavior using reinforcement learning on a countdown game task. The tutorial walks through training a smaller model with RL to exhibit chain-of-thought self-correction, similar to the behavior observed in DeepSeek R1. This serves as a practical open-source replication effort aimed at demystifying R1's training dynamics.
How to Deploy and Fine-Tune DeepSeek Models on AWS
Hugging Face published a guide covering deployment and fine-tuning of DeepSeek models on AWS infrastructure. The post addresses practical integration patterns for running DeepSeek-R1 and related models using AWS services alongside Hugging Face tooling. This is a tier-2 commentary/tutorial piece targeting practitioners who want to operationalize DeepSeek models in cloud environments.
Open-R1: a fully open reproduction of DeepSeek-R1
Hugging Face announced Open-R1, a community effort to fully reproduce DeepSeek-R1's training pipeline using open-source components. The project aims to replicate the data, training, and evaluation stages of DeepSeek-R1, making the entire process transparent and accessible. This follows significant interest in DeepSeek-R1's reinforcement-learning-based reasoning approach and addresses the lack of fully open reproduction of that methodology.
The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+
Hugging Face publishes a retrospective and forward-looking commentary marking one year since the 'DeepSeek moment,' examining how DeepSeek's open-weight releases reshaped the global open-source AI ecosystem. The piece analyzes the downstream effects on model development, inference economics, and competitive dynamics between open and closed AI labs. It situates these developments within a broader 'AI+' framing, suggesting a new phase of AI integration across industries.
Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek
A Hugging Face blog post reflecting on one year since the 'DeepSeek moment' examines the architectural decisions shaping China's open-source AI ecosystem. The piece analyzes how Chinese labs have built upon and diverged from DeepSeek's design choices in the intervening year. It situates these developments within the broader context of open-weights model progress and competitive dynamics between Chinese and Western AI development.
One Year Since the "DeepSeek Moment"
A Hugging Face retrospective marking one year since the DeepSeek moment, which shook assumptions about AI development costs and open-weights competitiveness. The piece likely reflects on how DeepSeek's efficient training approach influenced the broader AI landscape, open-weights progress, and inference economics over the past year. Published on the anniversary of the original release, it offers industry analysis from a major open-source AI platform perspective.
Anthropic Identifies Industrial-Scale Distillation Attacks by DeepSeek, Moonshot, and MiniMax
Anthropic has publicly identified three Chinese AI laboratories—DeepSeek, Moonshot AI, and MiniMax—as conducting coordinated, large-scale distillation attacks against Claude, generating over 16 million exchanges through approximately 24,000 fraudulent accounts in violation of terms of service. The campaigns targeted Claude's most differentiated capabilities including agentic reasoning, tool use, coding, and chain-of-thought generation, with MiniMax alone responsible for over 13 million exchanges. Anthropic frames these attacks as a national security concern, arguing that illicitly distilled models strip out safety safeguards and undermine US export controls. The company claims high-confidence attribution via IP correlation, request metadata, and infrastructure indicators, in some cases corroborated by industry partners.
DeepSeek V4 Pro reportedly beats GPT-5.5 Pro on precision
A community-discussed item (267 HN points, 120 comments) claims DeepSeek V4 Pro outperforms GPT-5.5 Pro on a precision metric. The source is a third-party site aggregating or reporting on benchmark comparisons between two frontier models. The discussion signals continued interest in the open-weights vs. closed-model competitive dynamic.
DeepSeek releases DeepSeek-V4-Flash-Base on Hugging Face
DeepSeek has released DeepSeek-V4-Flash-Base, a new open-weights base model, on Hugging Face. The model uses FP8 precision and the deepseek_v4 architecture with safetensors format. Early traction is notable with over 66,000 downloads and 241 likes shortly after release, suggesting significant community interest in a 'Flash' variant of the V4 series.
DeepSeek releases DeepSeek-V3.2-Speciale on Hugging Face
DeepSeek has published DeepSeek-V3.2-Speciale, a new text-generation model, on Hugging Face under the deepseek-ai organization. The model uses the deepseek_v32 architecture and supports fp8 precision with safetensors format. Early traction is notable with nearly 10,000 downloads and 708 likes shortly after release.
DeepSeek releases DeepSeek-OCR-2 vision-language model on Hugging Face
DeepSeek has released DeepSeek-OCR-2, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture and tagged for OCR and vision-language tasks. The model has accumulated over 1.8 million downloads and 980 likes, indicating substantial community uptake. It extends DeepSeek's multimodal model lineup with a specialized document/OCR capability.
DeepSeek releases R1-0528-Qwen3-8B distilled reasoning model on Hugging Face
DeepSeek released DeepSeek-R1-0528-Qwen3-8B, an 8B parameter text-generation model on Hugging Face, combining the R1-0528 reasoning capabilities with a Qwen3 base. The model has accumulated over 306K downloads and 1K likes shortly after release, indicating strong community uptake. This appears to be a distilled version of the R1-0528 reasoning model targeting smaller-scale deployment.
DeepSeek releases DeepSeek-Math-V2 on Hugging Face
DeepSeek has released DeepSeek-Math-V2, a math-specialized text-generation model, on Hugging Face. The model uses the deepseek_v32 architecture and is available in fp8 format with safetensors support. Early engagement metrics show 697 likes and 416 downloads, suggesting notable community interest for a new release.
DeepSeek releases DeepSeek-OCR vision-language model on Hugging Face
DeepSeek has released DeepSeek-OCR, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture. The model targets OCR and image feature extraction tasks and has accumulated over 2.4 million downloads and 3,275 likes, indicating significant community uptake. This represents an open-weights multimodal release from a major Chinese AI lab.
DeepSeek releases DeepSeek-V3.2-Exp-Base on Hugging Face
DeepSeek has published DeepSeek-V3.2-Exp-Base, an experimental base model for text generation, on Hugging Face. The model uses the deepseek_v32 architecture and supports fp8 precision with safetensors format. This appears to be a new experimental iteration in the DeepSeek-V3 series, though no technical details or benchmark results are provided in the release metadata.
Latest open artifacts (#21): Open model bonanza — Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others
Interconnects' recurring open-weights roundup covers a dense cluster of recent releases including Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1, characterizing the period as a flagship-after-flagship cadence. The piece also includes commentary on CAISI's assessment of DeepSeek V4. As a tier-2 commentary source, this is a synthesis and analysis layer rather than primary announcements.
Data Points: OpenAI and Microsoft sever their exclusive relationship
This edition of The Batch covers several major AI industry developments: OpenAI has revised its partnership with Microsoft, ending exclusivity while retaining Microsoft as primary cloud partner through 2032 and gaining freedom to deploy on AWS and Google Cloud. DeepSeek released V4 model weights featuring 1M-token context and Huawei Ascend chip optimization, though it trails leading open and closed models on aggregate benchmarks. Google and Amazon are deepening investments in Anthropic with up to $40B and $25B respectively in funding-for-compute deals, and an agentic AI system autonomously designed a functional RISC-V CPU from a 219-word spec in 12 hours.
Open R1: Update #3
Hugging Face's Open R1 project releases its third update, continuing the open-source replication effort of DeepSeek-R1's reasoning model training pipeline. The update likely covers progress on data, training runs, and evaluation results for the community-driven reproduction. This is part of an ongoing effort to make frontier reasoning model capabilities accessible via open weights and open training code.
Open R1: Update #2
Hugging Face's Open R1 project releases its second progress update on the open-source replication of DeepSeek-R1's reasoning capabilities. The update likely covers training progress, dataset releases, and intermediate model checkpoints as the team works toward a fully open reproduction of the reasoning model pipeline. Open R1 is a community-driven effort to make the techniques behind frontier reasoning models accessible to researchers.
Open R1: Update #4
Hugging Face's Open R1 project releases its fourth progress update on the open reproduction of DeepSeek-R1. The update likely covers training progress, dataset releases, and evaluation results for the open-weights reasoning model effort. This project is a community-driven attempt to replicate and open-source the techniques behind DeepSeek-R1's chain-of-thought reasoning capabilities.
DeepSeek Reasonix: Native Coding Agent with High Caching and Low Cost
DeepSeek Reasonix is a coding agent built natively on DeepSeek models, emphasizing high prompt caching rates and low inference cost. The project attracted significant Hacker News engagement (349 points, 171 comments), suggesting community interest in cost-efficient agentic coding workflows. It appears to be an open-source or community-developed tool rather than an official DeepSeek Labs release.
The Batch Issue 345: Iranian Drone Attacks on AWS Data Centers, Qwen3.5, DeepSeek-Huawei, and AI Job Insecurity
Andrew Ng's weekly newsletter covers several significant AI-adjacent developments: Iranian drones struck at least three Amazon Web Services data centers in Bahrain and the UAE, disrupting cloud services and raising concerns given U.S. military use of AWS to run Anthropic Claude; the issue also previews Qwen3.5 model releases across multiple sizes and DeepSeek's reported moves involving Huawei hardware. Ng also addresses widespread job insecurity across skill levels amid rapid AI advancement, citing geopolitical risks including the Iran war, Taiwan uncertainty, and rare-earth metal supply chains as compounding factors.
