Almanac
technique

L0 regularization

techniqueactiveprovisionall0-regularization-4edced4f·1 events·first seen 8h ago

Aliases: L0 regularization

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·8h ago·source ↗

ConSA: Learned FA/SWA allocation for efficient hybrid attention in LLMs

ConSA is a framework that learns optimal assignments between full attention and sliding-window attention layers under a user-specified sparsity target, using L0 regularization and augmented Lagrangian constraints. Evaluated on 0.6B and 1.7B parameter models, learned allocations consistently outperform hand-crafted rule-based baselines, with KV-head-wise granularity outperforming layer-wise. A consistent structural pattern emerges: SWA concentrates in bottom layers while FA clusters in contiguous middle-layer blocks, diverging from the evenly interleaved patterns used in existing hybrid architectures.