dataset
TinyStories
datasetactiveprovisional
tinystories-3ba67f93·1 events·first seen 3d agoAliases: TinyStories
Co-occurring entities
More like this (12)
Recent events (1)
Program synthesis used to reverse-engineer transformer attention heads with executable Python surrogates
Researchers propose a pipeline that approximates transformer attention heads with executable Python programs generated by a language model, then re-ranked by held-out predictive accuracy. Applied to GPT-2, TinyLlama-1.1B, and Llama-3B, fewer than 1,000 programs reproduce attention patterns with >75% average IoU similarity on TinyStories. Replacing 25% of attention heads with programmatic surrogates incurs only a 16% average perplexity increase while preserving downstream QA performance, demonstrating a path toward symbolic transparency in neural models.