Almanac
technique

Value-aware Stochastic KV Cache Eviction

techniqueactiveprovisionalvalue-aware-stochastic-kv-cache-eviction-a1056256·1 events·first seen 13d ago

Aliases: Value-aware Stochastic KV Cache Eviction

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·13d ago·source ↗

VaSE: Value-Aware Stochastic KV Cache Eviction improves reasoning model efficiency

A new arXiv preprint introduces Value-aware Stochastic KV Cache Eviction (VaSE), a training-free method for compressing KV caches in long-chain-of-thought reasoning models. The authors identify two key failure modes in prior eviction approaches — catastrophic repetition loops caused by evicting high-magnitude value states, and low cache diversity — and address both with targeted protections and stochastic eviction. On six reasoning tasks with Qwen3 models at 4x compression, VaSE outperforms the current best selection-based sparse attention method and exceeds the strongest eviction baseline by over 4%, while supporting FlashAttention2 and maintaining a static memory footprint.