technique
Value-aware Stochastic KV Cache Eviction
techniqueactiveprovisional
value-aware-stochastic-kv-cache-eviction-a1056256·1 events·first seen 13d agoAliases: Value-aware Stochastic KV Cache Eviction
Co-occurring entities
More like this (12)
KV CacheKV Cache QuantizationKey-Value CacheAccelerated Decentralized Stochastic Gradient Descent for Strongly Convex OptimizationPrivate Stochastic Convex OptimizationSnapKVkey-value (KV) activation projectionDynamic-Probabilistic Consistency GapContext-Driven Incremental CompressionVector Policy OptimizationHierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomesamortized variational inference
Recent events (1)
VaSE: Value-Aware Stochastic KV Cache Eviction improves reasoning model efficiency
A new arXiv preprint introduces Value-aware Stochastic KV Cache Eviction (VaSE), a training-free method for compressing KV caches in long-chain-of-thought reasoning models. The authors identify two key failure modes in prior eviction approaches — catastrophic repetition loops caused by evicting high-magnitude value states, and low cache diversity — and address both with targeted protections and stochastic eviction. On six reasoning tasks with Qwen3 models at 4x compression, VaSE outperforms the current best selection-based sparse attention method and exceeds the strongest eviction baseline by over 4%, while supporting FlashAttention2 and maintaining a static memory footprint.