Entity · technique

Value-aware Stochastic KV Cache Eviction

techniqueactivevalue-aware-stochastic-kv-cache-eviction-a1056256·1 events·first seen Jun 3, 2026

Aliases: Value-aware Stochastic KV Cache Eviction

Co-occurring entities

More like this (12)

Error Certificates for KV-Cache Eviction via Randomized Design KV Cache KV Cache Quantization Key-Value Cache Stage-Replay Divergence Follows the KV Cache: Fixed-Prefix Precision Controls and Bidirectional Cache Transplantation Eviction as Estimation: A Fixed-Lag Smoothing View of Test-Time Memory, and When Measuring Beats Accumulating FreqDepthKV Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization Private Stochastic Convex Optimization kvcache-ai SnapKV Entropy-Aware Dense Pruning

Recent events (1)

6arXiv · cs.CL·Jun 3, 2026·source ↗

VaSE: Value-Aware Stochastic KV Cache Eviction improves reasoning model efficiency

A new arXiv preprint introduces Value-aware Stochastic KV Cache Eviction (VaSE), a training-free method for compressing KV caches in long-chain-of-thought reasoning models. The authors identify two key failure modes in prior eviction approaches — catastrophic repetition loops caused by evicting high-magnitude value states, and low cache diversity — and address both with targeted protections and stochastic eviction. On six reasoning tasks with Qwen3 models at 4x compression, VaSE outperforms the current best selection-based sparse attention method and exceeds the strongest eviction baseline by over 4%, while supporting FlashAttention2 and maintaining a static memory footprint.

Frontier Model Releases Inference Economics FlashAttention-3 Qwen3 Value-aware Stochastic KV Cache Eviction