7OpenAI Blog·1mo ago

Extracting Concepts from GPT-4: 16 Million Patterns via Sparse Autoencoders

OpenAI applied scaled sparse autoencoders (SAEs) to GPT-4 to automatically identify approximately 16 million interpretable features or patterns in the model's internal computations. This represents a significant scaling of mechanistic interpretability techniques previously demonstrated on smaller models. The work advances the ability to understand what concepts and representations large frontier models encode internally.

Evaluation and Benchmarking AI Safety Research mechanistic interpretability Sparse Autoencoder OpenAI GPT-4

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

mechanistic interpretabilityConcept

Mechanistic Interpretability: Looking Inside the AI Black Box

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Related events (8)

6Openai Blog·1mo ago·source ↗

Generative modeling with sparse transformers

OpenAI introduced the Sparse Transformer, a deep neural network using a modified sparse attention mechanism to model sequences up to 30x longer than previously feasible with standard transformers. The approach sets new benchmarks on text, image, and audio generation tasks. The key algorithmic contribution is factorized sparse attention patterns that reduce the quadratic complexity of full self-attention.

Long Context Evolution Frontier Model Releases Sparse Transformer sparse attention OpenAI +1 more

9Openai Blog·1mo ago·source ↗

Introducing GPT-5.4

OpenAI has released GPT-5.4, described as their most capable and efficient frontier model targeting professional work. The model features state-of-the-art coding, computer use, and tool search capabilities, along with a 1 million token context window. This represents a significant capability and efficiency advancement over prior GPT-5 series models.

Long Context Evolution Frontier Model Releases OpenAI computer use 1M-token context +3 more

6Openai Blog·1mo ago·source ↗

Language models can explain neurons in language models

OpenAI uses GPT-4 to automatically generate and score natural-language explanations for the behavior of individual neurons in large language models. The methodology is applied to all neurons in GPT-2, producing a public dataset of explanations and quality scores. The authors acknowledge the explanations are imperfect, framing this as an early step toward automated mechanistic interpretability. This work establishes a scalable pipeline for neuron-level analysis that could inform future interpretability and safety research.

Evaluation and Benchmarking AI Safety Research GPT-2 automated mechanistic interpretability neuron explanation dataset +2 more

9Openai Blog·1mo ago·source ↗

Improving Language Understanding with Unsupervised Learning (GPT-1)

OpenAI published the GPT-1 paper in June 2018, demonstrating state-of-the-art results across diverse language tasks by combining transformer architectures with unsupervised pre-training followed by supervised fine-tuning. The approach is task-agnostic and scalable, showing that pre-training on large unlabeled text corpora and then fine-tuning on specific tasks yields strong generalization. This work established the foundational paradigm that would evolve into GPT-2, GPT-3, and subsequent large language models.

Frontier Model Releases Open Weights Progress Transformers GPT-1 OpenAI +3 more

9Openai Blog·1mo ago·source ↗

GPT-4 Release

OpenAI released GPT-4, a large multimodal model accepting image and text inputs and producing text outputs. The model demonstrates human-level performance on various professional and academic benchmarks. It represents OpenAI's latest milestone in scaling deep learning.

Frontier Model Releases Evaluation and Benchmarking OpenAI GPT-4 +1 more

6Openai Blog·1mo ago·source ↗

Understanding Neural Networks Through Sparse Circuits

OpenAI has published work on mechanistic interpretability using a sparse model approach aimed at understanding how neural networks reason internally. The research seeks to make AI systems more transparent by identifying sparse circuits within neural networks. This work is positioned as supporting safer and more reliable AI behavior through improved interpretability.

Evaluation and Benchmarking AI Safety Research Sparse Circuits mechanistic interpretability OpenAI

8Openai Blog·1mo ago·source ↗

OpenAI GPT-4.5 System Card

OpenAI has released a research preview of GPT-4.5, described as their largest and most knowledgeable model to date. The system card accompanies the model release, providing safety evaluations and capability documentation. This represents a significant step in OpenAI's model scaling trajectory between GPT-4 and any future GPT-5 release.

Frontier Model Releases Evaluation and Benchmarking OpenAI GPT-5.5 System Card GPT-5.5 +1 more

8Openai Blog·1mo ago·source ↗

Better language models and their implications

OpenAI announced GPT-2, a large-scale unsupervised language model capable of generating coherent multi-paragraph text and achieving state-of-the-art performance on language modeling benchmarks. The model demonstrated zero-shot capability across reading comprehension, machine translation, question answering, and summarization without task-specific fine-tuning. OpenAI notably withheld the full model release citing misuse concerns, marking an early high-profile instance of staged/responsible release policy.

Frontier Model Releases Evaluation and Benchmarking GPT-2 zero-shot learning unsupervised language modeling +3 more