6arXiv cs.LG (Machine Learning)·2d ago

Program synthesis used to reverse-engineer transformer attention heads with executable Python surrogates

Researchers propose a pipeline that approximates transformer attention heads with executable Python programs generated by a language model, then re-ranked by held-out predictive accuracy. Applied to GPT-2, TinyLlama-1.1B, and Llama-3B, fewer than 1,000 programs reproduce attention patterns with >75% average IoU similarity on TinyStories. Replacing 25% of attention heads with programmatic surrogates incurs only a 16% average perplexity increase while preserving downstream QA performance, demonstrating a path toward symbolic transparency in neural models.

Evaluation and Benchmarking AI Safety Research Llama 3.2 GPT-2 Explaining Attention with Program Synthesis TinyStories TinyLlama-1.1B

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6arXiv · cs.AI·11d ago·source ↗

Frontier coding agents use metaprogramming to handle esoteric programming languages

A new arXiv paper evaluates six LLM-based coding agents on four esoteric programming languages (including Brainfuck and Befunge-98), finding that the strongest agents—Claude Opus 4.6 and GPT-5.4 xhigh—often avoid writing the target language directly, instead generating it via Python metaprograms. Forbidding this strategy causes large performance drops, and text guidance alone does not transfer the capability to weaker models, though sharing Opus-derived Python helper code does sharply improve mid-tier agents. The study reveals capability stratification that mainstream benchmarks like SWE-Bench Verified compress into narrow bands, suggesting frontier agents succeed by constructing and debugging working models of unfamiliar environments rather than pattern-matching to training data.

Frontier Model Releases Evaluation and Benchmarking Claude Sonnet 4 Claude Opus 4.6 SWE-Bench Verified +8 more

6arXiv · cs.LG·19d ago·source ↗

Positional vs. Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

Researchers train a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks to study how attention heads specialize into positional or symbolic roles during learning. They find that successful task learning correlates with the emergence of 'pure' heads—exclusively positional or symbolic—and provide theoretical constructions showing how single-layer RoPE-based attention realizes these functions geometrically. A novel 'discrepancy' metric formalizes the robustness difference between the two head types, with symbolic mechanisms shown to extrapolate more reliably to longer sequences than positional ones. The findings have implications for understanding length generalization failures in RoPE-based models.

Long Context Evolution Evaluation and Benchmarking Transformers multi-hop reasoning Rotary Position Embedding (RoPE)+5 more

5The Batch·19d ago·source ↗

Researchers at UT-Austin and Google Model Human Decision-Making in Rock-Paper-Scissors

Researchers from UT-Austin and Google used AlphaEvolve, an evolutionary code-optimization method, to synthesize interpretable Python programs that predict move-by-move decisions of LLMs and humans playing rock-paper-scissors against bots. They found that Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-4.1 share similar sequential-pattern-tracking strategies that are more systematic than typical human play, while GPT-OSS 120B and humans relied on simpler opponent-move-frequency heuristics. The study demonstrates that code synthesis from behavioral data can serve as an interpretability tool for LLM decision-making, revealing that LLMs do not simply mimic human strategies.

Evaluation and Benchmarking AI Safety Research Google Gemini-2.5-Flash-Lite AlphaEvolve +6 more

6Openai Blog·1mo ago·source ↗

Generative modeling with sparse transformers

OpenAI introduced the Sparse Transformer, a deep neural network using a modified sparse attention mechanism to model sequences up to 30x longer than previously feasible with standard transformers. The approach sets new benchmarks on text, image, and audio generation tasks. The key algorithmic contribution is factorized sparse attention patterns that reduce the quadratic complexity of full self-attention.

Long Context Evolution Frontier Model Releases Sparse Transformer sparse attention OpenAI +1 more

6Openai Blog·1mo ago·source ↗

Image GPT: Transformer Models Applied to Pixel Sequences for Image Generation and Classification

OpenAI demonstrates that a large transformer model trained autoregressively on pixel sequences can generate coherent image completions and samples, analogous to text generation. The work establishes a correlation between generative sample quality and downstream image classification accuracy. The best generative model achieves features competitive with top convolutional networks in the unsupervised setting, suggesting shared representational principles across modalities.

Frontier Model Releases Multimodal Progress Transformers convolutional neural network OpenAI +2 more

6Hugging Face Blog·1mo ago·source ↗

Making LLMs lighter with AutoGPTQ and transformers

Hugging Face announces native integration of AutoGPTQ into the transformers library, enabling 4-bit quantized inference for large language models. The integration allows users to load and run GPTQ-quantized models directly through the standard transformers API with minimal code changes. This lowers the hardware barrier for deploying LLMs by significantly reducing VRAM requirements while maintaining competitive performance.

Open Weights Progress Inference Economics Transformers Hugging Face AutoGPTQ +2 more

4Hugging Face Blog·1mo ago·source ↗

The Reformer - Pushing the limits of language modeling

This Hugging Face blog post covers the Reformer, a memory-efficient transformer architecture that uses locality-sensitive hashing (LSH) attention and reversible residual layers to handle very long sequences. The post explains the technical mechanisms that allow Reformer to process sequences up to 1 million tokens with significantly reduced memory footprint compared to standard transformers. It serves as an educational deep-dive into the architectural innovations introduced in the original Reformer paper by Kitaev et al.

Training Infrastructure Long Context Evolution Nikita Kitaev Hugging Face Reformer +2 more

5Hugging Face Blog·1mo ago·source ↗

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

A Hugging Face blog post discusses inference optimization techniques derived from OpenAI's gpt-oss codebase that can be applied within the Hugging Face Transformers library. The post appears to cover practical tricks for improving transformer inference speed or efficiency. As a tier-2 source with commentary depth, this is a practitioner-oriented technical guide bridging OpenAI's internal methods and the open-source ecosystem.

Open Weights Progress Inference Economics GPT-OSS Hugging Face Transformers Hugging Face +2 more