dataset
LM1B
datasetactiveprovisional
lm1b-5a1309c0·1 events·first seen 7d agoAliases: LM1B
Co-occurring entities
More like this (12)
Recent events (1)
K-Forcing: Joint multi-token decoding via push-forward language modeling distillation
K-Forcing is a new inference acceleration paradigm that distills an autoregressive model into a push-forward mapping that generates k tokens per forward pass rather than one. The method uses progressive self-forcing distillation to match the teacher's sequence distribution, achieving 2.4–3.5x speedup at k=4 with modest quality degradation. Unlike speculative decoding, K-Forcing is designed to address high-load batch serving scenarios common in industrial deployment, while remaining compatible with standard AR infrastructure.