Entity · benchmark

RULER-CWE

benchmarkactiveruler-cwe-9a890bd8·1 events·first seen Jun 10, 2026

Aliases: RULER-CWE

Co-occurring entities

Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs SwiGLU Qwen2.5-8B

More like this (12)

RULER RULER QA-2 CWE-Trace CWQ QUBRIC R2R-CE CRAM Delta Rule CM-LRS OCR-Robust RL² C-RASP

Recent events (1)

5arXiv · cs.CL·Jun 10, 2026·source ↗

Predictor-gated bank-wise sparsity recipe for dense-to-sparse LLM upcycling from Qwen2.5-8B

A new arXiv preprint introduces a continual training recipe to convert dense LLMs into channel-sparse models without post-hoc pruning. Starting from a Qwen2.5-8B checkpoint, the method uses a low-rank predictor to gate FFN channel routing, achieving 4x sparsity in FFN intermediate activations via a bank-wise top-k rule at 32K context. The routing module is trained on the main language modeling path, making the resulting sparsity hardware-oriented rather than approximate. The authors also identify and patch a layer-local long-context failure mode on the RULER-CWE benchmark.

Training Infrastructure Inference Economics Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs SwiGLU RULER-CWE +1 more