model

GPT-2-small

modelactiveprovisionalgpt-2-small-6a17af37·1 events·first seen 12h ago

Aliases: GPT-2-small

Co-occurring entities

VASAE Llama-3.1-8B

More like this (12)

GPT-2 GPT-2 124M GPT-3.5 GPT-2 355M GPT-1 GPT-5.2-high GPT-3 GPT-5.2 DistilGPT-2 GPT-4.1 mini GPT-J GPT-4b micro

Recent events (1)

5arXiv · cs.CL·12h ago·source ↗

VASAE: Vocabulary-Aligned Sparse Autoencoder assigns intrinsic token names to SAE features during training

Researchers introduce VASAE (Vocabulary-Aligned Sparse Autoencoder), a method that trains SAE features with vocabulary-aligned anchoring so each feature is intrinsically named by the nearest token in the model's embedding space. Applied to GPT-2-small and Llama-3.1-8B, VASAE achieves ~90% feature alignment in shallow-to-middle layers without degrading reconstruction quality, though final-layer alignment is limited. The work addresses a longstanding interpretability bottleneck where SAE dictionary features require expensive post-hoc labeling, potentially enabling more scalable mechanistic analysis.

Evaluation and Benchmarking AI Safety Research GPT-2-small VASAE Llama-3.1-8B