Entity · paper

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

paperactiveunstable-features-reproducible-subspaces-understanding-seed-dependence-in-sparse-autoencoders-4780e86d·1 events·first seen Jun 11, 2026

Aliases: Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Co-occurring entities

Sparse Autoencoders

More like this (12)

Cross-seed explainability using Procrustes-conditioned Joint End-to-end Top-K Sparse Autoencoders Sparse Autoencoders Sparse Autoencoder C²R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders Sparse Autoencoders Encode Both Concepts and Functions: The Downstream Geometry of Feature Effects Sparse Autoencoders (SAEs)Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders Sparse Embedding Models conditional variational autoencoder Recovery Subspace Dimensionality Feature Auto-Encoder Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

Recent events (1)

6arXiv · cs.CL·Jun 11, 2026·source ↗

Study finds SAE unstable features reflect reproducible subspaces, not pure noise

A new arXiv paper investigates feature stability in sparse autoencoders (SAEs), measuring the probability that individual learned features reappear across independent training runs. The authors find a functional asymmetry: stable features carry most reconstruction-relevant signal, while unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting seed dependence reflects basis ambiguity rather than noise. A synthetic model confirms that low-rank ground-truth features can be recovered at the subspace level even when individual SAE latents are non-identifiable across seeds. The work has direct implications for interpretability research that relies on SAE features as meaningful, stable units of analysis.

Evaluation and Benchmarking AI Safety Research Sparse Autoencoders Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders