Almanac
benchmark

RULER-CWE

benchmarkactiveprovisionalruler-cwe-9a890bd8·1 events·first seen 7d ago

Aliases: RULER-CWE

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

Predictor-gated bank-wise sparsity recipe for dense-to-sparse LLM upcycling from Qwen2.5-8B

A new arXiv preprint introduces a continual training recipe to convert dense LLMs into channel-sparse models without post-hoc pruning. Starting from a Qwen2.5-8B checkpoint, the method uses a low-rank predictor to gate FFN channel routing, achieving 4x sparsity in FFN intermediate activations via a bank-wise top-k rule at 32K context. The routing module is trained on the main language modeling path, making the resulting sparsity hardware-oriented rather than approximate. The authors also identify and patch a layer-local long-context failure mode on the RULER-CWE benchmark.