Almanac
technique

low-rank subspace projection

techniqueactivelow-rank-subspace-projection-d36675ca·1 events·first seen 25d ago

Aliases: low-rank subspace projection

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·25d ago·source ↗

Self-Policy Distillation via Capability-Selective Subspace Projection

This paper introduces Self-Policy Distillation (SPD), a self-distillation method for LLMs that requires no external signals such as correctness filters or reward models. SPD extracts a low-rank capability subspace from the model's own gradients on correctness-defining tokens, then projects KV activations into this subspace during self-generation to isolate task-relevant signal from stylistic noise. Experiments across code generation, math reasoning, and QA show up to 13% improvement over prior signal-free self-distillation methods and 15% better out-of-domain generalization.