Almanac
model

TinyLlama-1.1B

modelactiveprovisionaltinyllama-1-1b-ab264c2b·1 events·first seen 3d ago

Aliases: TinyLlama-1.1B

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·3d ago·source ↗

Program synthesis used to reverse-engineer transformer attention heads with executable Python surrogates

Researchers propose a pipeline that approximates transformer attention heads with executable Python programs generated by a language model, then re-ranked by held-out predictive accuracy. Applied to GPT-2, TinyLlama-1.1B, and Llama-3B, fewer than 1,000 programs reproduce attention patterns with >75% average IoU similarity on TinyStories. Replacing 25% of attention heads with programmatic surrogates incurs only a 16% average perplexity increase while preserving downstream QA performance, demonstrating a path toward symbolic transparency in neural models.