Almanac
technique

CLP (Collocation-Length Predictor)

techniqueactiveprovisionalclp-collocation-length-predictor--1be0d11c·1 events·first seen 7d ago

Aliases: CLP (Collocation-Length Predictor), Collocation-Length Predictor

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·7d ago·source ↗

CLP: Lightweight collocation-length predictor achieves zero-loss multi-token inference speedup

Researchers propose CLP (Collocation-Length Predictor), a span-level decision layer for accelerating LLM inference via multi-token prediction without quality degradation. The key insight is 'Backbone-as-Architect': the backbone LM head always generates the first token while MTP heads handle only subsequent tokens, eliminating head-backbone competition that causes repetitive outputs in prior methods. CLP uses a single linear layer (~4.6K–7.7K parameters) versus 1M-parameter gate networks in prior work, achieving 1.14x–1.29x speedup on Qwen2.5 models with near-zero repetition ratio. The paper also establishes that shorter prediction horizons improve MTP head accuracy on larger models, offering a scaling-aware design principle.