Almanac
product

DeepSeek API

productactivedeepseek-api-2385cf9c·3 events·first seen 1mo ago

Aliases: DeepSeek API

Co-occurring entities

More like this (12)

Recent events (3)

7Deepseek News·1mo ago·source ↗

DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%

DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.

9Deepseek News·1mo ago·source ↗

DeepSeek-R1 Release: Open-Source Reasoning Model on Par with OpenAI o1

DeepSeek has released DeepSeek-R1, a reasoning-focused large language model claiming performance parity with OpenAI o1 on math, code, and reasoning benchmarks. The model is fully open-source under the MIT License, including weights and outputs, enabling distillation and commercial use. Six distilled smaller models (up to 32B and 70B) are also released, with the 32B and 70B variants reportedly matching OpenAI o1-mini. API access is live at significantly lower pricing than comparable frontier models ($0.55/M input tokens, $2.19/M output tokens).

8Deepseek News·1mo ago·source ↗

DeepSeek Releases V3.2-Exp with Sparse Attention Architecture and 50%+ API Price Cut

DeepSeek has released DeepSeek-V3.2-Exp, an experimental model built on V3.1-Terminus that introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve long-context performance and reduce compute costs during training and inference. Benchmarks indicate V3.2-Exp performs on par with V3.1-Terminus while achieving efficiency gains. The release is accompanied by a 50%+ API price reduction effective immediately, open-weights release on Hugging Face, a technical report, and GPU kernel code in TileLang and CUDA.