Image GPT: Transformer Models Applied to Pixel Sequences for Image Generation and Classification
OpenAI demonstrates that a large transformer model trained autoregressively on pixel sequences can generate coherent image completions and samples, analogous to text generation. The work establishes a correlation between generative sample quality and downstream image classification accuracy. The best generative model achieves features competitive with top convolutional networks in the unsupervised setting, suggesting shared representational principles across modalities.
Related guides (3)
Related events (8)
Introducing 4o Image Generation
OpenAI has integrated a native image generation capability directly into GPT-4o, positioning it as a primary model capability rather than a separate system. The announcement frames this as their most advanced image generator to date, emphasizing both aesthetic quality and practical utility. This represents a shift toward unified multimodal models that generate images natively rather than relying on separate diffusion-based pipelines.
OpenAI Launches gpt-image-1 Image Generation Model via API
OpenAI has made its latest image generation model, gpt-image-1, available through its API for developers and businesses. The model is positioned for professional-grade, customizable visual generation integrated directly into third-party tools and platforms. This follows OpenAI's earlier consumer-facing image generation features and extends them to programmatic access.
Introducing ChatGPT Images 2.0
OpenAI has launched ChatGPT Images 2.0, a new image generation model integrated into ChatGPT. The release highlights improved text rendering, multilingual support, and advanced visual reasoning capabilities. This represents an upgrade to OpenAI's consumer-facing image generation offering.
Generative modeling with sparse transformers
OpenAI introduced the Sparse Transformer, a deep neural network using a modified sparse attention mechanism to model sequences up to 30x longer than previously feasible with standard transformers. The approach sets new benchmarks on text, image, and audio generation tasks. The key algorithmic contribution is factorized sparse attention patterns that reduce the quadratic complexity of full self-attention.
Meta Research Improves Image Generation via Staged Planning and Self-Revision Fine-Tuning
Researchers from Meta and collaborating universities propose a fine-tuning method that teaches image generators to compose images through discrete plan-sketch-inspect-refine cycles rather than generating all at once. Starting from BAGEL-7B, they construct ~62,000 training examples using GPT-4o and FLUX.1 Kontext to supervise each stage, achieving 83% on GenEval versus 77% for the base model and a competing method (PARM) that required 11x more training data and ~8x more inference steps. The approach improves spatial relationship accuracy, object attribute fidelity, and real-world knowledge grounding in generated images.
OpenAI Launches GPT-Image-1.5 via ChatGPT Images Update
OpenAI has rolled out an upgraded image generation model, GPT-Image-1.5, to all ChatGPT users and via the API. The update promises more precise edits, more consistent details, and up to 4× faster image generation compared to the previous version. The rollout is global and simultaneous across consumer and API access tiers.
Improving Language Understanding with Unsupervised Learning (GPT-1)
OpenAI published the GPT-1 paper in June 2018, demonstrating state-of-the-art results across diverse language tasks by combining transformer architectures with unsupervised pre-training followed by supervised fine-tuning. The approach is task-agnostic and scalable, showing that pre-training on large unlabeled text corpora and then fine-tuning on specific tasks yields strong generalization. This work established the foundational paradigm that would evolve into GPT-2, GPT-3, and subsequent large language models.
[AINews] ImageGen is on the Path to AGI
Latent Space commentary piece reflecting on the continued explosion of GPT-Image-2 usage and its broader implications for AI capabilities. The piece frames recent image generation advances as significant steps on a trajectory toward AGI. Published as part of the AINews series, this is a tier-2 commentary source synthesizing recent developments around GPT-Image-2.


