Almanac
model

GPT-2-small

modelactiveprovisionalgpt-2-small-6a17af37·1 events·first seen 12h ago

Aliases: GPT-2-small

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·12h ago·source ↗

VASAE: Vocabulary-Aligned Sparse Autoencoder assigns intrinsic token names to SAE features during training

Researchers introduce VASAE (Vocabulary-Aligned Sparse Autoencoder), a method that trains SAE features with vocabulary-aligned anchoring so each feature is intrinsically named by the nearest token in the model's embedding space. Applied to GPT-2-small and Llama-3.1-8B, VASAE achieves ~90% feature alignment in shallow-to-middle layers without degrading reconstruction quality, though final-layer alignment is limited. The work addresses a longstanding interpretability bottleneck where SAE dictionary features require expensive post-hoc labeling, potentially enabling more scalable mechanistic analysis.