model

TinyLlama-1.1B

modelactiveprovisionaltinyllama-1-1b-ab264c2b·1 events·first seen 3d ago

Aliases: TinyLlama-1.1B

Co-occurring entities

Llama 3.2 GPT-2 Explaining Attention with Program Synthesis TinyStories

More like this (12)

Llama-3.1-8B Llama 1B Meta Llama 3.1 405B Llama-3 Llama 3.2 Llama 3.1 70B BioLlama3 Llama 2 70B Llama 2 Llama Llama 3 Llama 3.2 11B Vision

Recent events (1)

6arXiv · cs.LG·3d ago·source ↗

Program synthesis used to reverse-engineer transformer attention heads with executable Python surrogates

Researchers propose a pipeline that approximates transformer attention heads with executable Python programs generated by a language model, then re-ranked by held-out predictive accuracy. Applied to GPT-2, TinyLlama-1.1B, and Llama-3B, fewer than 1,000 programs reproduce attention patterns with >75% average IoU similarity on TinyStories. Replacing 25% of attention heads with programmatic surrogates incurs only a 16% average perplexity increase while preserving downstream QA performance, demonstrating a path toward symbolic transparency in neural models.

Evaluation and Benchmarking AI Safety Research Llama 3.2 GPT-2 Explaining Attention with Program Synthesis +2 more