Entity · paper

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

paperactivecontinual-llm-upcycling-a-predictor-gated-bank-wise-sparsity-training-recipe-for-dense-to-sparse-llms-90d67440·1 events·first seen Jun 10, 2026

Aliases: Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs

Co-occurring entities

SwiGLU RULER-CWE Qwen2.5-8B

More like this (12)

PALS: Percentile-Aware Layerwise Sparsity for LLM Pruning PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training A sleep-like consolidation mechanism for LLMs In-Place Tokenizer Expansion for Pre-trained LLMs Extending LLM Context via Associative Recurrent Memory Co-LMLM: Continuous-Query Limited Memory Language Models Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs Learning from the Self-future: On-policy Self-distillation for dLLMs Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference ExpRL: Exploratory RL for LLM Mid-Training Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

Recent events (1)

5arXiv · cs.CL·Jun 10, 2026·source ↗

Predictor-gated bank-wise sparsity recipe for dense-to-sparse LLM upcycling from Qwen2.5-8B

A new arXiv preprint introduces a continual training recipe to convert dense LLMs into channel-sparse models without post-hoc pruning. Starting from a Qwen2.5-8B checkpoint, the method uses a low-rank predictor to gate FFN channel routing, achieving 4x sparsity in FFN intermediate activations via a bank-wise top-k rule at 32K context. The routing module is trained on the main language modeling path, making the resulting sparsity hardware-oriented rather than approximate. The authors also identify and patch a layer-local long-context failure mode on the RULER-CWE benchmark.

Training Infrastructure Inference Economics Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs SwiGLU RULER-CWE +1 more