Entity · technique

key-value (KV) activation projection

techniqueactivekey-value-kv-activation-projection-f4d85c94·1 events·first seen May 22, 2026

Aliases: key-value (KV) activation projection

Co-occurring entities

large language models low-rank subspace projection on-policy self-distillation

More like this (12)

visual-token activation probing SnapKV Multimodal Voice Activity Projection FreqDepthKV Do transformers need three projections? Systematic study of QKV variants Activation Atlases KV Cache Task Vectors VPT Model KVPress Subspace Projection Routing-Conditioned Projection

Recent events (1)

6arXiv · cs.CL·May 22, 2026·source ↗

Self-Policy Distillation via Capability-Selective Subspace Projection

This paper introduces Self-Policy Distillation (SPD), a self-distillation method for LLMs that requires no external signals such as correctness filters or reward models. SPD extracts a low-rank capability subspace from the model's own gradients on correctness-defining tokens, then projects KV activations into this subspace during self-generation to isolate task-relevant signal from stylistic noise. Experiments across code generation, math reasoning, and QA show up to 13% improvement over prior signal-free self-distillation methods and 15% better out-of-domain generalization.

Frontier Model Releases Evaluation and Benchmarking large language models key-value (KV) activation projection low-rank subspace projection +2 more