6Hacker News (AI-filtered, score >= 200)·33h ago

DeepSeek releases DSpark: speculative decoding system for LLM inference acceleration

DeepSeek published DSpark, a paper describing a speculative decoding system designed to accelerate LLM inference. The paper is hosted on DeepSeek's GitHub and attracted significant Hacker News engagement (598 points, 228 comments), suggesting meaningful community interest. Speculative decoding is an active inference optimization technique, and a release from DeepSeek carries weight given their track record on inference efficiency.

Training Infrastructure Inference Economics speculative decoding DeepSeek V4 DSpark

Related guides (3)

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

speculative decodingConcept

Speculative Decoding: Making AI Faster Without Changing the Answer

Read asBeginner In-depth

Related events (8)

5Deepseek·39h ago·source ↗

DeepSeek releases DeepSeek-V4-Pro-DSpark on Hugging Face

DeepSeek has published a new model checkpoint, DeepSeek-V4-Pro-DSpark, on Hugging Face under the text-generation category. The model uses the deepseek_v4 architecture and supports FP8 and 8-bit quantization formats. The 'DSpark' suffix suggests a variant or specialized version of the DeepSeek V4 Pro line, though no accompanying technical documentation is visible in this listing.

Frontier Model Releases Open Weights Progress DeepSeek V4 DeepSeek-V4-Pro-DSpark Hugging Face

5Deepseek·39h ago·source ↗

DeepSeek releases DeepSeek-V4-Flash-DSpark on Hugging Face

DeepSeek has published a new model checkpoint, DeepSeek-V4-Flash-DSpark, on Hugging Face under the deepseek_v4 model family. The release is tagged as a text-generation model with FP8 and 8-bit support, suggesting an efficiency-optimized variant. The 'Flash' and 'DSpark' naming implies a faster or distilled derivative of the DeepSeek V4 flagship. Download counts are near zero, indicating a very recent upload.

Frontier Model Releases Inference Economics DeepSeek V4 DeepSeek-V4-Flash Hugging Face

3Deepseek·9h ago·source ↗

DeepSeek releases EAGLE3 speculative decoding draft model for Qwen3-4B

DeepSeek published eagle3_qwen3_4b_ttt7 on Hugging Face, a draft model for EAGLE3 speculative decoding targeting the Qwen3-4B base model. EAGLE3 is DeepSeek's third-generation speculative decoding framework designed to accelerate inference by predicting future tokens with a lightweight draft model. The release is a narrow inference-optimization artifact with zero downloads and likes at time of indexing, suggesting it is very fresh or experimental.

Open Weights Progress Inference Economics Eagle3 DeepSeek V4 Qwen3-4B +2 more

3Deepseek·9h ago·source ↗

DeepSeek releases EAGLE3 speculative decoding draft model for Qwen3-8B

DeepSeek published eagle3_qwen3_8b_ttt7 on Hugging Face, a draft model for EAGLE3 speculative decoding targeting the Qwen3-8B base model. EAGLE3 is DeepSeek's third-generation speculative decoding framework designed to accelerate inference by predicting future tokens with a lightweight draft head. The release is a narrow inference optimization artifact with minimal engagement at time of indexing.

Inference Economics Eagle3 DeepSeek V4 Qwen3-4B +2 more

8Deepseek News·1mo ago·source ↗

DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut

DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.

Training Infrastructure Long Context Evolution DeepSeek API DeepSeek V4 TileLang +5 more

3Deepseek·9h ago·source ↗

DeepSeek releases Eagle3 speculative decoding draft model for Qwen3-14B

DeepSeek published eagle3_qwen3_14b_ttt7 on Hugging Face, a draft model for the Eagle3 speculative decoding framework targeting Qwen3-14B. Eagle3 is DeepSeek's third-generation speculative decoding approach designed to accelerate inference. The release is a narrow infrastructure artifact with zero downloads and likes at time of indexing, suggesting it is very early or experimental.

Inference Economics Eagle3 DeepSeek V4 Qwen3-14B +1 more

4Hacker News·1mo ago·source ↗

DeepSeek Reasonix: Native Coding Agent with High Caching and Low Cost

DeepSeek Reasonix is a coding agent built natively on DeepSeek models, emphasizing high prompt caching rates and low inference cost. The project attracted significant Hacker News engagement (349 points, 171 comments), suggesting community interest in cost-efficient agentic coding workflows. It appears to be an open-source or community-developed tool rather than an official DeepSeek Labs release.

Open Weights Progress Inference Economics DeepSeek V4 DeepSeek Reasonix +1 more

6arXiv · cs.AI·26d ago·source ↗

SimSD: Speculative Decoding Adapted for Diffusion Language Models

SimSD introduces a training-free speculative decoding algorithm for diffusion large language models (dLLMs), which previously could not use standard token-level speculative decoding due to their bidirectional attention and masked language modeling formulation. The method uses a plug-and-play masking strategy that introduces reference tokens from a draft model and a custom attention mask, enabling valid logit computation for drafted tokens in a single forward pass. Evaluated on SDAR-family dLLMs across four benchmarks, SimSD achieves up to 7.46x decoding throughput improvement while maintaining or improving generation quality. The approach is compatible with other acceleration techniques such as KV cache and blockwise decoding.

Frontier Model Releases Inference Economics KV Cache speculative decoding SDAR +4 more