Almanac
technique

ProxySPEX

techniqueactiveproxyspex-d3a6c61e·1 events·first seen 1mo ago

Aliases: ProxySPEX

Co-occurring entities

More like this (12)

Recent events (1)

6Berkeley Ai Research (Bair) Blog·1mo ago·source ↗

SPEX and ProxySPEX: Scalable Interaction Discovery for LLM Interpretability

Researchers from BAIR introduce SPEX (Spectral Explainer) and ProxySPEX, algorithms for identifying influential feature, data, and model-component interactions in LLMs at scale. The approach exploits sparsity, low-degreeness, and hierarchy properties to reframe interaction discovery as a sparse recovery problem using tools from signal processing and coding theory. ProxySPEX achieves comparable performance to SPEX with roughly 10x fewer ablations by leveraging hierarchical structure. The methods are evaluated on feature attribution (sentiment analysis), data attribution, and mechanistic interpretability tasks, outperforming marginal methods like LIME at long context lengths.