paper
Explaining Attention with Program Synthesis
paperactiveprovisional
explaining-attention-with-program-synthesis-22511918·1 events·first seen 2d agoAliases: Explaining Attention with Program Synthesis
Co-occurring entities
More like this (12)
ProbSparse AttentionFunctional Attentionsymbolic attention headsNeuronal Stochastic Attention Circuit (NSAC)sparse attentionListening with Attention: Entropy-Guided Explainability for Transformer-Based Audio ModelsHow Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-SpeechLie-Algebra Attentionreference attentionSliding Window Attentioncode synthesis LLMsbidirectional attention
Recent events (1)
Program synthesis used to reverse-engineer transformer attention heads with executable Python surrogates
Researchers propose a pipeline that approximates transformer attention heads with executable Python programs generated by a language model, then re-ranked by held-out predictive accuracy. Applied to GPT-2, TinyLlama-1.1B, and Llama-3B, fewer than 1,000 programs reproduce attention patterns with >75% average IoU similarity on TinyStories. Replacing 25% of attention heads with programmatic surrogates incurs only a 16% average perplexity increase while preserving downstream QA performance, demonstrating a path toward symbolic transparency in neural models.