Almanac
← Events
7OpenAI Blog·1mo ago

Extracting Concepts from GPT-4: 16 Million Patterns via Sparse Autoencoders

OpenAI applied scaled sparse autoencoders (SAEs) to GPT-4 to automatically identify approximately 16 million interpretable features or patterns in the model's internal computations. This represents a significant scaling of mechanistic interpretability techniques previously demonstrated on smaller models. The work advances the ability to understand what concepts and representations large frontier models encode internally.

Related guides (3)

Related events (8)

6Openai Blog·1mo ago·source ↗

Generative modeling with sparse transformers

OpenAI introduced the Sparse Transformer, a deep neural network using a modified sparse attention mechanism to model sequences up to 30x longer than previously feasible with standard transformers. The approach sets new benchmarks on text, image, and audio generation tasks. The key algorithmic contribution is factorized sparse attention patterns that reduce the quadratic complexity of full self-attention.

9Openai Blog·1mo ago·source ↗

Introducing GPT-5.4

OpenAI has released GPT-5.4, described as their most capable and efficient frontier model targeting professional work. The model features state-of-the-art coding, computer use, and tool search capabilities, along with a 1 million token context window. This represents a significant capability and efficiency advancement over prior GPT-5 series models.

6Openai Blog·1mo ago·source ↗

Language models can explain neurons in language models

OpenAI uses GPT-4 to automatically generate and score natural-language explanations for the behavior of individual neurons in large language models. The methodology is applied to all neurons in GPT-2, producing a public dataset of explanations and quality scores. The authors acknowledge the explanations are imperfect, framing this as an early step toward automated mechanistic interpretability. This work establishes a scalable pipeline for neuron-level analysis that could inform future interpretability and safety research.

9Openai Blog·1mo ago·source ↗

Improving Language Understanding with Unsupervised Learning (GPT-1)

OpenAI published the GPT-1 paper in June 2018, demonstrating state-of-the-art results across diverse language tasks by combining transformer architectures with unsupervised pre-training followed by supervised fine-tuning. The approach is task-agnostic and scalable, showing that pre-training on large unlabeled text corpora and then fine-tuning on specific tasks yields strong generalization. This work established the foundational paradigm that would evolve into GPT-2, GPT-3, and subsequent large language models.

9Openai Blog·1mo ago·source ↗

GPT-4 Release

OpenAI released GPT-4, a large multimodal model accepting image and text inputs and producing text outputs. The model demonstrates human-level performance on various professional and academic benchmarks. It represents OpenAI's latest milestone in scaling deep learning.

6Openai Blog·1mo ago·source ↗

Understanding Neural Networks Through Sparse Circuits

OpenAI has published work on mechanistic interpretability using a sparse model approach aimed at understanding how neural networks reason internally. The research seeks to make AI systems more transparent by identifying sparse circuits within neural networks. This work is positioned as supporting safer and more reliable AI behavior through improved interpretability.

8Openai Blog·1mo ago·source ↗

OpenAI GPT-4.5 System Card

OpenAI has released a research preview of GPT-4.5, described as their largest and most knowledgeable model to date. The system card accompanies the model release, providing safety evaluations and capability documentation. This represents a significant step in OpenAI's model scaling trajectory between GPT-4 and any future GPT-5 release.

8Openai Blog·1mo ago·source ↗

Better language models and their implications

OpenAI announced GPT-2, a large-scale unsupervised language model capable of generating coherent multi-paragraph text and achieving state-of-the-art performance on language modeling benchmarks. The model demonstrated zero-shot capability across reading comprehension, machine translation, question answering, and summarization without task-specific fine-tuning. OpenAI notably withheld the full model release citing misuse concerns, marking an early high-profile instance of staged/responsible release policy.