technique
ConSA
techniqueactiveprovisional
consa-1c8fc6a6·1 events·first seen 9h agoAliases: ConSA
Co-occurring entities
More like this (12)
Recent events (1)
ConSA: Learned FA/SWA allocation for efficient hybrid attention in LLMs
ConSA is a framework that learns optimal assignments between full attention and sliding-window attention layers under a user-specified sparsity target, using L0 regularization and augmented Lagrangian constraints. Evaluated on 0.6B and 1.7B parameter models, learned allocations consistently outperform hand-crafted rule-based baselines, with KV-head-wise granularity outperforming layer-wise. A consistent structural pattern emerges: SWA concentrates in bottom layers while FA clusters in contiguous middle-layer blocks, diverging from the evenly interleaved patterns used in existing hybrid architectures.