Extracting Concepts from GPT-4: 16 Million Patterns via Sparse Autoencoders
OpenAI applied scaled sparse autoencoders (SAEs) to GPT-4 to automatically identify approximately 16 million interpretable features or patterns in the model's internal computations. This represents a significant scaling of mechanistic interpretability techniques previously demonstrated on smaller models. The work advances the ability to understand what concepts and representations large frontier models encode internally.
Related guides (3)
Related events (8)
Generative modeling with sparse transformers
OpenAI introduced the Sparse Transformer, a deep neural network using a modified sparse attention mechanism to model sequences up to 30x longer than previously feasible with standard transformers. The approach sets new benchmarks on text, image, and audio generation tasks. The key algorithmic contribution is factorized sparse attention patterns that reduce the quadratic complexity of full self-attention.
Introducing GPT-5.4
OpenAI has released GPT-5.4, described as their most capable and efficient frontier model targeting professional work. The model features state-of-the-art coding, computer use, and tool search capabilities, along with a 1 million token context window. This represents a significant capability and efficiency advancement over prior GPT-5 series models.
Language models can explain neurons in language models
OpenAI uses GPT-4 to automatically generate and score natural-language explanations for the behavior of individual neurons in large language models. The methodology is applied to all neurons in GPT-2, producing a public dataset of explanations and quality scores. The authors acknowledge the explanations are imperfect, framing this as an early step toward automated mechanistic interpretability. This work establishes a scalable pipeline for neuron-level analysis that could inform future interpretability and safety research.
Improving Language Understanding with Unsupervised Learning (GPT-1)
OpenAI published the GPT-1 paper in June 2018, demonstrating state-of-the-art results across diverse language tasks by combining transformer architectures with unsupervised pre-training followed by supervised fine-tuning. The approach is task-agnostic and scalable, showing that pre-training on large unlabeled text corpora and then fine-tuning on specific tasks yields strong generalization. This work established the foundational paradigm that would evolve into GPT-2, GPT-3, and subsequent large language models.
GPT-4 Release
OpenAI released GPT-4, a large multimodal model accepting image and text inputs and producing text outputs. The model demonstrates human-level performance on various professional and academic benchmarks. It represents OpenAI's latest milestone in scaling deep learning.
Understanding Neural Networks Through Sparse Circuits
OpenAI has published work on mechanistic interpretability using a sparse model approach aimed at understanding how neural networks reason internally. The research seeks to make AI systems more transparent by identifying sparse circuits within neural networks. This work is positioned as supporting safer and more reliable AI behavior through improved interpretability.
OpenAI GPT-4.5 System Card
OpenAI has released a research preview of GPT-4.5, described as their largest and most knowledgeable model to date. The system card accompanies the model release, providing safety evaluations and capability documentation. This represents a significant step in OpenAI's model scaling trajectory between GPT-4 and any future GPT-5 release.
Better language models and their implications
OpenAI announced GPT-2, a large-scale unsupervised language model capable of generating coherent multi-paragraph text and achieving state-of-the-art performance on language modeling benchmarks. The model demonstrated zero-shot capability across reading comprehension, machine translation, question answering, and summarization without task-specific fine-tuning. OpenAI notably withheld the full model release citing misuse concerns, marking an early high-profile instance of staged/responsible release policy.


