technique
Context Caching on Disk
techniqueactive
context-caching-on-disk-bf157c94·1 events·first seen 1mo agoAliases: Context Caching on Disk
Co-occurring entities
More like this (12)
Recent events (1)
DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%
DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.