Almanac
paper

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

paperactiveprovisionalclp-collocation-length-prediction-for-zero-loss-adaptive-multi-token-inference-9538e306·1 events·first seen 7d ago

Aliases: CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·7d ago·source ↗

CLP: Lightweight collocation-length predictor achieves zero-loss multi-token inference speedup

Researchers propose CLP (Collocation-Length Predictor), a span-level decision layer for accelerating LLM inference via multi-token prediction without quality degradation. The key insight is 'Backbone-as-Architect': the backbone LM head always generates the first token while MTP heads handle only subsequent tokens, eliminating head-backbone competition that causes repetitive outputs in prior methods. CLP uses a single linear layer (~4.6K–7.7K parameters) versus 1M-parameter gate networks in prior work, achieving 1.14x–1.29x speedup on Qwen2.5 models with near-zero repetition ratio. The paper also establishes that shorter prediction horizons improve MTP head accuracy on larger models, offering a scaling-aware design principle.