technique
L0 regularization
techniqueactiveprovisional
l0-regularization-4edced4f·1 events·first seen 8h agoAliases: L0 regularization
Co-occurring entities
More like this (12)
KL-Cov regularizationEntropy RegularizationKL-regularized RLR-Drop consistency regularizationEntropy-Regularized Reinforcement LearningLoRA (Low-Rank Adaptation)Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMsPC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-TrainingL-infinity perturbationELBO variance minimizationDivergence Regularized Policy OptimizationLanguage Model Finetuning
Recent events (1)
ConSA: Learned FA/SWA allocation for efficient hybrid attention in LLMs
ConSA is a framework that learns optimal assignments between full attention and sliding-window attention layers under a user-specified sparsity target, using L0 regularization and augmented Lagrangian constraints. Evaluated on 0.6B and 1.7B parameter models, learned allocations consistently outperform hand-crafted rule-based baselines, with KV-head-wise granularity outperforming layer-wise. A consistent structural pattern emerges: SWA concentrates in bottom layers while FA clusters in contiguous middle-layer blocks, diverging from the evenly interleaved patterns used in existing hybrid architectures.