model

DeepSeek V4

modelactivedeepseek-v4-a30ff1de·83 events·first seen 1mo ago

Aliases: DeepSeek V4, DeepSeek, DeepSeek-AI, DeepSeek-V4, DeepSeek-R1, DeepSeek R1, DeepSeek V3, DeepSeek V2, DeepSeek-V2.5, DeepSeek V3.2, DeepSeek-V3, DeepSeek-V2, DeepSeek-V3.1, DeepSeek-V3.2-Exp, DeepSeek-V3.2, DeepSeek V4 Pro, DeepSeek V3 Pro, DeepSeek-V4-Pro, DeepSeek v3, DeepSeek-V4-Pro-Base, DeepSeek-VL-v2, DeepSeek-V3.2-Exp-Base

Co-occurring entities

More like this (12)

DeepSeek-V3.1-Base DeepSeek-V3-0324 DeepSeek-V4-Flash DeepSeek-V2.5-1210 DeepSeek-V4-Pro Preview DeepSeek API DeepSeek-V3.2-Speciale DeepSeek-R1-Lite-Preview DeepSeek-Prover-V2-7B DeepSeek-R1-0528 DeepSeek-V3.1-Terminus DeepSeek-Coder-V2-0724

Guides (1)

DeepSeek V4

DeepSeek V4: The Open-Weights Giant Reshaping AI Economics

Read asBeginner In-depth

Recent events (50)

6Hugging Face Blog·1mo ago·source ↗

DeepSeek-V4: a million-token context that agents can actually use

A Hugging Face blog post discusses DeepSeek-V4, highlighting its million-token context window as a practically usable capability for agentic applications. The post appears to analyze or announce DeepSeek-V4's long-context features in the context of agent workflows. No article body was available for deeper analysis.

Long Context Evolution Frontier Model Releases DeepSeek V4 Hugging Face +2 more

6Deepseek News·1mo ago·source ↗

DeepSeek API Major Upgrade: Function Calling, FIM, Chat Prefix Completion, JSON Output, and 8K Token Limit

DeepSeek has released a significant API update adding Function Calling (up to 128 parallel calls, OpenAI-compatible), JSON Output, Chat Prefix Completion, and FIM (Fill-In-the-Middle) Completion to both deepseek-chat and deepseek-coder models. The update also raises the max_tokens ceiling to 8K in the Beta API. Several features are in Beta and will be open-sourced once stable. The Function Calling and JSON Output implementations are explicitly designed to be compatible with the OpenAI API.

Open Weights Progress Inference Economics DeepSeek V4 FIM Completion deepseek-chat +5 more

7Deepseek News·1mo ago·source ↗

DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%

DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.

Long Context Evolution Frontier Model Releases DeepSeek API DeepSeek V4 Context Caching on Disk +2 more

6Deepseek News·1mo ago·source ↗

DeepSeek-V2.5: Merged Open-Source Model Combining General and Coding Capabilities

DeepSeek has released DeepSeek-V2.5, an open-source model that merges DeepSeek-V2-Chat-0628 and DeepSeek-Coder-V2-0724 into a single unified model. The release improves general conversational capabilities, coding performance, instruction-following, and writing tasks while also strengthening safety properties—raising the overall safety score from 74.4% to 82.6% and reducing safety spillover rate from 11.3% to 4.6%. The model is available via backward-compatible API endpoints (deepseek-chat and deepseek-coder) and on HuggingFace, retaining features like Function Calling, FIM completion, and JSON output. Benchmark results show improvements on HumanEval Python and LiveCodeBench, though SWE-verified performance remains an acknowledged weak area.

Frontier Model Releases Evaluation and Benchmarking DeepSeek-V2-Chat-0628 DeepSeek V4 SWE-Bench Verified +8 more

7Deepseek News·1mo ago·source ↗

DeepSeek-R1-Lite-Preview Launched with o1-Level Reasoning Performance

DeepSeek has released DeepSeek-R1-Lite-Preview, a reasoning-focused model claiming o1-preview-level performance on AIME and MATH benchmarks. The model features a transparent, real-time chain-of-thought process and demonstrates inference scaling behavior where longer reasoning chains yield better results. DeepSeek has indicated that open-source model weights and a full API are forthcoming. The model is currently accessible via chat.deepseek.com.

Frontier Model Releases Evaluation and Benchmarking o1-preview DeepSeek V4 AIME +4 more

9Deepseek News·1mo ago·source ↗

DeepSeek-V3: 671B MoE Open-Source Model with 3x Speed Improvement

DeepSeek releases V3, a 671B parameter Mixture-of-Experts model with 37B activated parameters, trained on 14.8T tokens. The model runs at 60 tokens/second (3x faster than V2) and is fully open-source with weights and paper released. API pricing is set at $0.27/M input tokens and $1.10/M output tokens starting February 8, positioning it as a low-cost frontier alternative. DeepSeek signals future multimodal capabilities in the ecosystem.

Frontier Model Releases Open Weights Progress DeepSeek V4 Mixture of Experts +2 more

9Deepseek News·1mo ago·source ↗

DeepSeek-R1 Release: Open-Source Reasoning Model on Par with OpenAI o1

DeepSeek has released DeepSeek-R1, a reasoning-focused large language model claiming performance parity with OpenAI o1 on math, code, and reasoning benchmarks. The model is fully open-source under the MIT License, including weights and outputs, enabling distillation and commercial use. Six distilled smaller models (up to 32B and 70B) are also released, with the 32B and 70B variants reportedly matching OpenAI o1-mini. API access is live at significantly lower pricing than comparable frontier models ($0.55/M input tokens, $2.19/M output tokens).

Frontier Model Releases Evaluation and Benchmarking DeepSeek API DeepSeek V4 OpenAI o3-mini +5 more

7Deepseek News·1mo ago·source ↗

DeepSeek-V3-0324 Released with Improved Reasoning, Tool-Use, and MIT License

DeepSeek has released DeepSeek-V3-0324, an updated version of its V3 model featuring major improvements in reasoning performance, front-end development capabilities, and tool-use. The model is now released under the MIT License, matching DeepSeek-R1's open licensing terms. Weights are publicly available on Hugging Face, and the API interface remains unchanged from the prior V3 version.

Frontier Model Releases Open Weights Progress DeepSeek-V3-0324 DeepSeek V4 MIT License +2 more

6Deepseek News·1mo ago·source ↗

DeepSeek-R1-0528 Released with Improved Benchmarks, Reduced Hallucinations, and Function Calling

DeepSeek has released DeepSeek-R1-0528, an updated version of its R1 reasoning model featuring improved benchmark performance, reduced hallucinations, enhanced front-end capabilities, and new support for JSON output and function calling. The API interface remains unchanged, and open-source weights are available on Hugging Face. This is an incremental update to the R1 series rather than a new flagship model.

Frontier Model Releases Open Weights Progress DeepSeek-R1-0528 DeepSeek V4 Hugging Face +1 more

8Deepseek News·1mo ago·source ↗

DeepSeek-V3.1 Release: Hybrid Think/Non-Think Model with Agent-Focused Upgrades

DeepSeek has released V3.1, a hybrid inference model supporting both thinking and non-thinking modes in a single model, positioned as their first step toward the agent era. The model features improved tool use and multi-step agent task performance, with benchmarks showing gains on SWE-bench and Terminal-Bench, and faster thinking efficiency compared to DeepSeek-R1-0528. The base model received 840B tokens of continued pretraining for long-context extension, a new tokenizer, and open-source weights are available on HuggingFace. API updates include 128K context for both modes, Anthropic API format compatibility, and strict function calling support in beta.

Long Context Evolution Frontier Model Releases DeepSeek-R1-0528 DeepSeek V4 SWE-bench +6 more

5Deepseek News·1mo ago·source ↗

DeepSeek releases V3.1-Terminus, an incremental update to V3.1 with agent and language consistency improvements

DeepSeek has released DeepSeek-V3.1-Terminus, an update to its V3.1 model addressing user feedback on language mixing issues and improving Code Agent and Search Agent performance. The release claims more stable and reliable benchmark outputs compared to V3.1. Weights are publicly available on Hugging Face, and the model is accessible via the DeepSeek app, web, and API.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face DeepSeek-V3.1-Terminus +1 more

8Deepseek News·1mo ago·source ↗

DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut

DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.

Training Infrastructure Long Context Evolution DeepSeek API DeepSeek V4 TileLang +5 more

8Deepseek News·1mo ago·source ↗

DeepSeek-V3.2 and V3.2-Speciale Released: Reasoning-First Models with Agent Tool-Use Integration

DeepSeek has released two new open-weights models: DeepSeek-V3.2, the official successor to V3.2-Exp with balanced reasoning and tool-use capabilities, and DeepSeek-V3.2-Speciale, a maxed-out reasoning variant claiming gold-medal performance on IMO, CMO, ICPC World Finals, and IOI 2025. V3.2 is the first DeepSeek model to integrate chain-of-thought thinking directly into tool-use workflows, trained on a new agent data synthesis pipeline covering 1,800+ environments and 85k+ complex instructions. V3.2-Speciale is API-only with no tool-call support, available via a temporary endpoint expiring December 15, 2025, while both models are open-sourced on Hugging Face with an accompanying technical report.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Gemini-3.0-Pro ICPC World Finals +8 more

9Deepseek News·1mo ago·source ↗

DeepSeek V4 Preview Release: 1.6T-param Pro and 284B Flash Models with 1M Context, Open-Sourced

DeepSeek has released DeepSeek-V4 as an open-weights preview, comprising two MoE variants: V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters). Both models support 1M token context by default, enabled by a novel Token-wise compression and DeepSeek Sparse Attention (DSA) architecture. V4-Pro claims open-source SOTA on agentic coding benchmarks and world-class math/STEM/coding performance rivaling top closed-source models, while V4-Flash offers near-parity reasoning at lower cost and latency. The API is live today with OpenAI and Anthropic compatibility, and legacy model endpoints will be retired in July 2026.

Long Context Evolution Frontier Model Releases DeepSeek V4 DeepSeek-V4-Flash Claude Code +7 more

6Hacker News·29d ago·source ↗

DeepSeek Makes V4 Pro Price Discount Permanent

DeepSeek has announced that the previously temporary price discount on its V4 Pro model is now permanent. This pricing change is notable in the context of ongoing inference cost competition among frontier model providers. The announcement generated significant community discussion on Hacker News with 234 points and 141 comments.

Frontier Model Releases Inference Economics DeepSeek V4

6Hacker News·27d ago·source ↗

DeepSeek to Make Permanent 75% Discount on Flagship AI Model

DeepSeek is permanently reducing pricing on its flagship AI model by 75%, signaling a sustained aggressive pricing strategy rather than a temporary promotional move. This continues the pattern of Chinese AI labs applying significant downward pressure on frontier model API pricing. The move has implications for competitive dynamics across the inference market and may force responses from other major providers.

Frontier Model Releases Inference Economics DeepSeek V4 DeepSeek flagship model

6The Batch·18d ago·source ↗

DeepSeek withholds DeepSeek-V4 pre-release access from Nvidia and AMD, shares with Huawei

DeepSeek has given Huawei several weeks of pre-release access to its upcoming DeepSeek-V4 model for hardware optimization, while denying the same access to Nvidia and AMD — a departure from prior practice. Reuters also reported that an unnamed Trump administration official claims DeepSeek-V4 was trained on Nvidia's most advanced chips despite U.S. export controls, though the sourcing is unverified. The move signals deepening geopolitical fragmentation in AI supply chains and aligns with China's push for domestic chip self-sufficiency. DeepSeek-V4 has not yet been publicly released.

Frontier Model Releases Open Weights Progress Reuters DeepSeek V4 NVIDIA +3 more

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V4-Pro on Hugging Face

DeepSeek has released DeepSeek-V4-Pro, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports FP8 and 8-bit quantization formats and is tagged as endpoints-compatible with eval results included. With over 4.3 million downloads and 4,740 likes, it has attracted significant community uptake.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.2 on Hugging Face

DeepSeek has released DeepSeek-V3.2, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports fp8 precision, is endpoints-compatible, and has accumulated over 3.6 million downloads and 1,446 likes, indicating significant community uptake. This appears to be a successor to DeepSeek-V3, continuing the lab's competitive open-weights model series.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V4-Pro-Base on Hugging Face

DeepSeek has released DeepSeek-V4-Pro-Base, a new base model, on Hugging Face with fp8 and safetensors support. The model has accumulated over 20,000 downloads and 291 likes shortly after release. This represents a new generation in DeepSeek's V-series open-weights frontier models.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V4-Flash on Hugging Face

DeepSeek has released DeepSeek-V4-Flash, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports FP8 and 8-bit quantization and is tagged as conversational and endpoints-compatible. With over 2.8 million downloads and 1,455 likes, it has seen substantial early uptake.

Frontier Model Releases Open Weights Progress DeepSeek V4 DeepSeek-V4-Flash Hugging Face +1 more

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.2-Exp on Hugging Face

DeepSeek has published DeepSeek-V3.2-Exp, an experimental text-generation model, on Hugging Face under the deepseek-ai organization. The model uses the deepseek_v32 architecture and supports fp8 precision, with tags indicating eval results and endpoint compatibility. Early traction is notable with nearly 176K downloads and ~1K likes shortly after release.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.1-Terminus on Hugging Face

DeepSeek has published DeepSeek-V3.1-Terminus, a new text-generation model, on Hugging Face under the deepseek_v3 architecture family. The model supports FP8 precision, safetensors format, and is compatible with text-generation-inference endpoints. Early traction is visible with over 11,500 downloads and 365 likes shortly after release.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face DeepSeek-V3.1-Terminus

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.1 on Hugging Face

DeepSeek has released DeepSeek-V3.1, a new text-generation model published on Hugging Face under the deepseek-ai organization. The model supports fp8 precision, text-generation-inference, and endpoint deployment, and has accumulated over 220K downloads and 824 likes shortly after release. This appears to be an updated iteration of the DeepSeek-V3 series, a frontier-class open-weights model family.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.1-Base on Hugging Face

DeepSeek has released DeepSeek-V3.1-Base, a new base model for text generation, on Hugging Face. The model supports fp8 precision, safetensors format, and is compatible with text-generation-inference endpoints. With over 1,000 likes and nearly 9,000 downloads shortly after release, it is attracting significant community attention as a successor to the widely-used DeepSeek-V3.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face DeepSeek-V3.1-Base

6Hacker News·9d ago·source ↗

Hugging Face open reproduction of DeepSeek-R1

Hugging Face has published an open reproduction of DeepSeek-R1, the reasoning-focused language model, on GitHub. The project aims to replicate DeepSeek-R1's training methodology and capabilities in an open-weights setting. This contributes to the broader effort to make frontier reasoning model techniques accessible to the research community.

Frontier Model Releases Open Weights Progress DeepSeek V4 Open R1 Hugging Face

5Deepseek News·1mo ago·source ↗

DeepSeek V2.5-1210: Final Update to V2.5 Series, V3 Generation Teased

DeepSeek has released DeepSeek-V2.5-1210, the final update to its V2.5 model series, with claimed improvements across math, coding, writing, and roleplay benchmarks. The model is available as open weights on Hugging Face. DeepSeek also announced the launch of Internet Search on chat.deepseek.com. The release marks the end of the V2 generation, with the company signaling work on next-generation foundation models.

Frontier Model Releases Open Weights Progress DeepSeek V4 deepseek-chat Hugging Face +2 more

6Hugging Face Blog·1mo ago·source ↗

Open-R1: Update #1 — Open Reproduction of DeepSeek-R1

Hugging Face's Open-R1 project provides a first progress update on its open reproduction of DeepSeek-R1, a reasoning-focused language model. The update covers early training runs, dataset construction, and evaluation results aimed at replicating DeepSeek-R1's chain-of-thought reasoning capabilities. This effort is part of the broader open-weights community push to reproduce frontier reasoning models transparently.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Open R1 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Mini-R1: Reproducing DeepSeek R1 'Aha Moment' — An RL Tutorial

A Hugging Face blog post demonstrates how to reproduce DeepSeek R1's emergent 'aha moment' reasoning behavior using reinforcement learning on a countdown game task. The tutorial walks through training a smaller model with RL to exhibit chain-of-thought self-correction, similar to the behavior observed in DeepSeek R1. This serves as a practical open-source replication effort aimed at demystifying R1's training dynamics.

Frontier Model Releases Open Weights Progress DeepSeek V4 GRPO Open R1 +3 more

4Hugging Face Blog·1mo ago·source ↗

How to Deploy and Fine-Tune DeepSeek Models on AWS

Hugging Face published a guide covering deployment and fine-tuning of DeepSeek models on AWS infrastructure. The post addresses practical integration patterns for running DeepSeek-R1 and related models using AWS services alongside Hugging Face tooling. This is a tier-2 commentary/tutorial piece targeting practitioners who want to operationalize DeepSeek models in cloud environments.

Open Weights Progress Inference Economics DeepSeek V4 Hugging Face Amazon Web Services +1 more

7Hugging Face Blog·1mo ago·source ↗

Open-R1: a fully open reproduction of DeepSeek-R1

Hugging Face announced Open-R1, a community effort to fully reproduce DeepSeek-R1's training pipeline using open-source components. The project aims to replicate the data, training, and evaluation stages of DeepSeek-R1, making the entire process transparent and accessible. This follows significant interest in DeepSeek-R1's reinforcement-learning-based reasoning approach and addresses the lack of fully open reproduction of that methodology.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Open R1 Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

Hugging Face publishes a retrospective and forward-looking commentary marking one year since the 'DeepSeek moment,' examining how DeepSeek's open-weight releases reshaped the global open-source AI ecosystem. The piece analyzes the downstream effects on model development, inference economics, and competitive dynamics between open and closed AI labs. It situates these developments within a broader 'AI+' framing, suggesting a new phase of AI integration across industries.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

A Hugging Face blog post reflecting on one year since the 'DeepSeek moment' examines the architectural decisions shaping China's open-source AI ecosystem. The piece analyzes how Chinese labs have built upon and diverged from DeepSeek's design choices in the intervening year. It situates these developments within the broader context of open-weights model progress and competitive dynamics between Chinese and Western AI development.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

One Year Since the "DeepSeek Moment"

A Hugging Face retrospective marking one year since the DeepSeek moment, which shook assumptions about AI development costs and open-weights competitiveness. The piece likely reflects on how DeepSeek's efficient training approach influenced the broader AI landscape, open-weights progress, and inference economics over the past year. Published on the anniversary of the original release, it offers industry analysis from a major open-source AI platform perspective.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face +1 more

9Anthropic News·19d ago·source ↗

Anthropic Identifies Industrial-Scale Distillation Attacks by DeepSeek, Moonshot, and MiniMax

Anthropic has publicly identified three Chinese AI laboratories—DeepSeek, Moonshot AI, and MiniMax—as conducting coordinated, large-scale distillation attacks against Claude, generating over 16 million exchanges through approximately 24,000 fraudulent accounts in violation of terms of service. The campaigns targeted Claude's most differentiated capabilities including agentic reasoning, tool use, coding, and chain-of-thought generation, with MiniMax alone responsible for over 13 million exchanges. Anthropic frames these attacks as a national security concern, arguing that illicitly distilled models strip out safety safeguards and undermine US export controls. The company claims high-confidence attribution via IP correlation, request metadata, and infrastructure indicators, in some cases corroborated by industry partners.

Frontier Model Releases Open Weights Progress knowledge distillation Kimi DeepSeek V4 +9 more

5Hacker News·12d ago·source ↗

DeepSeek V4 Pro reportedly beats GPT-5.5 Pro on precision

A community-discussed item (267 HN points, 120 comments) claims DeepSeek V4 Pro outperforms GPT-5.5 Pro on a precision metric. The source is a third-party site aggregating or reporting on benchmark comparisons between two frontier models. The discussion signals continued interest in the open-weights vs. closed-model competitive dynamic.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 GPT Pro OpenAI

7Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V4-Flash-Base on Hugging Face

DeepSeek has released DeepSeek-V4-Flash-Base, a new open-weights base model, on Hugging Face. The model uses FP8 precision and the deepseek_v4 architecture with safetensors format. Early traction is notable with over 66,000 downloads and 241 likes shortly after release, suggesting significant community interest in a 'Flash' variant of the V4 series.

Frontier Model Releases Open Weights Progress DeepSeek V4 DeepSeek-V4-Flash Hugging Face +1 more

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.2-Speciale on Hugging Face

DeepSeek has published DeepSeek-V3.2-Speciale, a new text-generation model, on Hugging Face under the deepseek-ai organization. The model uses the deepseek_v32 architecture and supports fp8 precision with safetensors format. Early traction is notable with nearly 10,000 downloads and 708 likes shortly after release.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face DeepSeek-V3.2-Speciale

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-OCR-2 vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR-2, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture and tagged for OCR and vision-language tasks. The model has accumulated over 1.8 million downloads and 980 likes, indicating substantial community uptake. It extends DeepSeek's multimodal model lineup with a specialized document/OCR capability.

Open Weights Progress Multimodal Progress DeepSeek-OCR-2 DeepSeek V4 Hugging Face

6Deepseek·11d ago·source ↗

DeepSeek releases R1-0528-Qwen3-8B distilled reasoning model on Hugging Face

DeepSeek released DeepSeek-R1-0528-Qwen3-8B, an 8B parameter text-generation model on Hugging Face, combining the R1-0528 reasoning capabilities with a Qwen3 base. The model has accumulated over 306K downloads and 1K likes shortly after release, indicating strong community uptake. This appears to be a distilled version of the R1-0528 reasoning model targeting smaller-scale deployment.

Frontier Model Releases Open Weights Progress DeepSeek-R1-0528 DeepSeek V4 DeepSeek-R1-0528-Qwen3-8B +3 more

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-Math-V2 on Hugging Face

DeepSeek has released DeepSeek-Math-V2, a math-specialized text-generation model, on Hugging Face. The model uses the deepseek_v32 architecture and is available in fp8 format with safetensors support. Early engagement metrics show 697 likes and 416 downloads, suggesting notable community interest for a new release.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face DeepSeek-Math-V2

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-OCR vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture. The model targets OCR and image feature extraction tasks and has accumulated over 2.4 million downloads and 3,275 likes, indicating significant community uptake. This represents an open-weights multimodal release from a major Chinese AI lab.

Open Weights Progress Multimodal Progress DeepSeek-OCR-2 DeepSeek V4

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-V3.2-Exp-Base on Hugging Face

DeepSeek has published DeepSeek-V3.2-Exp-Base, an experimental base model for text generation, on Hugging Face. The model uses the deepseek_v32 architecture and supports fp8 precision with safetensors format. This appears to be a new experimental iteration in the DeepSeek-V3 series, though no technical details or benchmark results are provided in the release metadata.

Frontier Model Releases Open Weights Progress DeepSeek V4 Hugging Face

6Interconnects·1mo ago·source ↗

Latest open artifacts (#21): Open model bonanza — Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others

Interconnects' recurring open-weights roundup covers a dense cluster of recent releases including Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1, characterizing the period as a flagship-after-flagship cadence. The piece also includes commentary on CAISI's assessment of DeepSeek V4. As a tier-2 commentary source, this is a synthesis and analysis layer rather than primary announcements.

Frontier Model Releases Evaluation and Benchmarking MiMo 2.5 Interconnects DeepSeek V4 +7 more

7The Batch·19d ago·source ↗

Data Points: OpenAI and Microsoft sever their exclusive relationship

This edition of The Batch covers several major AI industry developments: OpenAI has revised its partnership with Microsoft, ending exclusivity while retaining Microsoft as primary cloud partner through 2032 and gaining freedom to deploy on AWS and Google Cloud. DeepSeek released V4 model weights featuring 1M-token context and Huawei Ascend chip optimization, though it trails leading open and closed models on aggregate benchmarks. Google and Amazon are deepening investments in Anthropic with up to $40B and $25B respectively in funding-for-compute deals, and an agentic AI system autonomously designed a functional RISC-V CPU from a 219-word spec in 12 hours.

Training Infrastructure Frontier Model Releases Google Cloud Google TPU knowledge distillation +25 more

5Hugging Face Blog·1mo ago·source ↗

Open R1: Update #3

Hugging Face's Open R1 project releases its third update, continuing the open-source replication effort of DeepSeek-R1's reasoning model training pipeline. The update likely covers progress on data, training runs, and evaluation results for the community-driven reproduction. This is part of an ongoing effort to make frontier reasoning model capabilities accessible via open weights and open training code.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Open R1 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Open R1: Update #2

Hugging Face's Open R1 project releases its second progress update on the open-source replication of DeepSeek-R1's reasoning capabilities. The update likely covers training progress, dataset releases, and intermediate model checkpoints as the team works toward a fully open reproduction of the reasoning model pipeline. Open R1 is a community-driven effort to make the techniques behind frontier reasoning models accessible to researchers.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Open R1 Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Open R1: Update #4

Hugging Face's Open R1 project releases its fourth progress update on the open reproduction of DeepSeek-R1. The update likely covers training progress, dataset releases, and evaluation results for the open-weights reasoning model effort. This project is a community-driven attempt to replicate and open-source the techniques behind DeepSeek-R1's chain-of-thought reasoning capabilities.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Open R1 Hugging Face +1 more

4Hacker News·27d ago·source ↗

DeepSeek Reasonix: Native Coding Agent with High Caching and Low Cost

DeepSeek Reasonix is a coding agent built natively on DeepSeek models, emphasizing high prompt caching rates and low inference cost. The project attracted significant Hacker News engagement (349 points, 171 comments), suggesting community interest in cost-efficient agentic coding workflows. It appears to be an open-source or community-developed tool rather than an official DeepSeek Labs release.

Open Weights Progress Inference Economics DeepSeek V4 DeepSeek Reasonix +1 more

6The Batch·18d ago·source ↗

The Batch Issue 345: Iranian Drone Attacks on AWS Data Centers, Qwen3.5, DeepSeek-Huawei, and AI Job Insecurity

Andrew Ng's weekly newsletter covers several significant AI-adjacent developments: Iranian drones struck at least three Amazon Web Services data centers in Bahrain and the UAE, disrupting cloud services and raising concerns given U.S. military use of AWS to run Anthropic Claude; the issue also previews Qwen3.5 model releases across multiple sizes and DeepSeek's reported moves involving Huawei hardware. Ng also addresses widespread job insecurity across skill levels amid rapid AI advancement, citing geopolitical risks including the Iran war, Taiwan uncertainty, and rare-earth metal supply chains as compounding factors.

Training Infrastructure Frontier Model Releases DeepLearning.AI DeepSeek V4 Claude +7 more