technique

CLP (Collocation-Length Predictor)

techniqueactiveprovisionalclp-collocation-length-predictor--1be0d11c·1 events·first seen 7d ago

Aliases: CLP (Collocation-Length Predictor), Collocation-Length Predictor

Co-occurring entities

Qwen2.5 Alibaba CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference Backbone-as-Architect

More like this (12)

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference Character-level Language Model LAMDA-CL Contrastive Language-Image Pretraining (CLIP)LALS (Latent Association Leaning Score)CLAX-PT CLIP Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs lean-lsp-mcp long-context LLMs Latent Context Language Models TailLoR

Recent events (1)

5arXiv · cs.AI·7d ago·source ↗

CLP: Lightweight collocation-length predictor achieves zero-loss multi-token inference speedup

Researchers propose CLP (Collocation-Length Predictor), a span-level decision layer for accelerating LLM inference via multi-token prediction without quality degradation. The key insight is 'Backbone-as-Architect': the backbone LM head always generates the first token while MTP heads handle only subsequent tokens, eliminating head-backbone competition that causes repetitive outputs in prior methods. CLP uses a single linear layer (~4.6K–7.7K parameters) versus 1M-parameter gate networks in prior work, achieving 1.14x–1.29x speedup on Qwen2.5 models with near-zero repetition ratio. The paper also establishes that shorter prediction horizons improve MTP head accuracy on larger models, offering a scaling-aware design principle.

Inference Economics Qwen2.5 Alibaba CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference +2 more