paper
Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders
paperactiveprovisional
beyond-the-hard-budget-sparsity-regularizers-for-more-interpretable-top-k-sparse-autoencoders-427ead72·1 events·first seen 3d agoAliases: Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders
Co-occurring entities
More like this (12)
Sparse AutoencodersSparse AutoencoderUnstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse AutoencodersContinual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMsEntropy RegularizationEntropy-Regularized Reinforcement LearningFeature Auto-EncoderSparse Autoencoders (SAEs)intra-frame entropy-guided sparsificationSparse Embedding Modelsconditional variational autoencoderKL-Cov regularization
Recent events (1)
Sparsity regularizers improve interpretability of Top-k sparse autoencoders for vision models
A new arXiv preprint proposes two sparsity regularizers compatible with Top-k sparse autoencoders (SAEs), a standard tool for mechanistic interpretability of vision foundation models. The regularizers — an ℓ1 penalty on off-support units and a scale-invariant ℓ1/ℓ2-ratio penalty — are applied before Top-k selection and consistently improve monosemanticity without degrading reconstruction quality across two datasets and three vision models. The central finding is that hard architectural sparsity and soft regularization are complementary, addressing known limitations of fixed-budget Top-k SAEs such as overfitting to training k values.