Entity · technique

Context Caching on Disk

techniqueactivecontext-caching-on-disk-bf157c94·1 events·first seen May 18, 2026

Aliases: Context Caching on Disk

Co-occurring entities

DeepSeek API DeepSeek V4 Multi-head Latent Attention (MLA)

More like this (12)

connection-scoped caching Prompt Caching KV Cache IndexCache Context-Driven Incremental Compression in-context learning context-rot ReContext Key-Value Cache Content Credentials Canonical-Context On-Policy Distillation (CCOPD)LMCache

Recent events (1)

7Deepseek News·May 18, 2026·source ↗

DeepSeek API Introduces Context Caching on Disk, Cutting Token Prices by ~90%

DeepSeek has launched a disk-based context caching service for its API, reducing cache-hit token pricing to $0.014 per million tokens versus $0.14 for cache misses—a 90% cost reduction. The system requires no code changes, runs automatically for prefix-matched inputs, and reduces first-token latency from ~13s to ~500ms on 128K prompts. DeepSeek attributes the feasibility of disk caching to the compact KV cache produced by its MLA (Multi-head Latent Attention) architecture in DeepSeek V2, which it claims makes it the first LLM API provider to deploy extensive disk caching at scale. The service supports up to 1 trillion tokens per day with no concurrency limits.

Long Context Evolution Frontier Model Releases DeepSeek API DeepSeek V4 Context Caching on Disk +2 more