Almanac
paper

Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

paperactiveprovisionalbeyond-the-hard-budget-sparsity-regularizers-for-more-interpretable-top-k-sparse-autoencoders-427ead72·1 events·first seen 3d ago

Aliases: Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.AI·3d ago·source ↗

Sparsity regularizers improve interpretability of Top-k sparse autoencoders for vision models

A new arXiv preprint proposes two sparsity regularizers compatible with Top-k sparse autoencoders (SAEs), a standard tool for mechanistic interpretability of vision foundation models. The regularizers — an ℓ1 penalty on off-support units and a scale-invariant ℓ1/ℓ2-ratio penalty — are applied before Top-k selection and consistently improve monosemanticity without degrading reconstruction quality across two datasets and three vision models. The central finding is that hard architectural sparsity and soft regularization are complementary, addressing known limitations of fixed-budget Top-k SAEs such as overfitting to training k values.