model
GPT-2-small
modelactiveprovisional
gpt-2-small-6a17af37·1 events·first seen 12h agoAliases: GPT-2-small
Co-occurring entities
More like this (12)
Recent events (1)
VASAE: Vocabulary-Aligned Sparse Autoencoder assigns intrinsic token names to SAE features during training
Researchers introduce VASAE (Vocabulary-Aligned Sparse Autoencoder), a method that trains SAE features with vocabulary-aligned anchoring so each feature is intrinsically named by the nearest token in the model's embedding space. Applied to GPT-2-small and Llama-3.1-8B, VASAE achieves ~90% feature alignment in shallow-to-middle layers without degrading reconstruction quality, though final-layer alignment is limited. The work addresses a longstanding interpretability bottleneck where SAE dictionary features require expensive post-hoc labeling, potentially enabling more scalable mechanistic analysis.