Learning Concepts with Energy Functions
OpenAI presents an energy-based model capable of learning abstract spatial concepts—such as 'near,' 'above,' and 'between'—from only five demonstrations using sets of 2D points. The model generalizes across domains, transferring concepts learned in a 2D particle environment to control tasks in a 3D physics-based robot simulation. The work demonstrates few-shot concept acquisition and cross-domain transfer via energy function representations.
Related guides (2)
Related events (8)
Implicit Generation and Generalization Methods for Energy-Based Models
OpenAI published research on stable and scalable training of energy-based models (EBMs), achieving sample quality competitive with GANs at low temperatures while retaining mode coverage guarantees of likelihood-based models. The approach uses iterative compute during generation to continually refine outputs. This work positions EBMs as a promising alternative generative modeling paradigm bridging GANs and likelihood-based models.
OpenAI Develops Hierarchical Reinforcement Learning Algorithm for Long-Horizon Tasks
OpenAI published research on a hierarchical reinforcement learning (HRL) algorithm that learns reusable high-level actions to solve tasks requiring thousands of timesteps. Applied to navigation problems, the algorithm discovers locomotion primitives (walking, crawling in various directions) that enable rapid mastery of new tasks. The approach addresses a core challenge in RL: efficient exploration and transfer across long-horizon tasks.
Joint Energy-Based Models Reveal a Generative-Discriminative Sweet Spot for Human-Aligned Vision
Researchers use Joint Energy-Based Models (JEMs) to isolate the effect of learning objective—independent of architecture, scale, and data—on human alignment in visual representations. By varying a single mixing coefficient between discriminative and generative training, they evaluate models across six human-alignment benchmarks and find that alignment peaks at intermediate points on the generative-discriminative continuum rather than at either extreme. The results suggest that hybrid objectives combining categorical structure from discriminative learning with input-structure sensitivity from generative learning yield the most human-like visual behavior. This challenges the framing of generative vs. discriminative as a binary choice for building human-aligned vision systems.
Generalizing from Simulation: OpenAI Sim-to-Real Robotics Transfer
OpenAI published results on sim-to-real transfer for robot controllers, demonstrating that policies trained entirely in simulation can be deployed on physical robots and respond to unplanned environmental changes. The work represents a shift from open-loop to closed-loop control systems in robotics. This is a 2017 research milestone predating current frontier model work but relevant to the historical trajectory of OpenAI's robotics program.
Learning Montezuma's Revenge from a Single Demonstration
OpenAI trained a reinforcement learning agent to achieve a score of 74,500 on Montezuma's Revenge using a single human demonstration, surpassing all previously published results. The method is straightforward: the agent plays episodes starting from carefully selected states drawn from the demonstration, optimizing game score via PPO. This approach demonstrates that imitation-seeded curriculum learning can dramatically improve exploration in hard-exploration environments. The same PPO algorithm underpins OpenAI Five.
Multimodal neurons in artificial neural networks
OpenAI researchers discovered neurons in CLIP that respond to the same concept across literal, symbolic, and conceptual representations. This finding parallels multimodal neurons previously observed in biological brains and helps explain CLIP's ability to classify unusual visual renditions of concepts. The work is presented as a step toward understanding the associations and biases learned by CLIP and similar vision-language models.
Episodic Context and Persistent 3D World Models Enable Curiosity-Driven Exploration in Photorealistic Environments
This paper addresses the failure modes of curiosity-driven RL in complex 3D environments, where agents revisit forgotten states and get trapped in local loops due to lacking spatial persistence and episodic memory. The authors combine an online 3D reconstruction as a persistent world model with a sequence-model policy over RGB observations to maintain episodic trajectory context. Trained purely via intrinsic curiosity on HM3D, the agent outperforms RL-based active mapping baselines and zero-shot generalizes to Gibson and AI-generated environments. The approach also enables efficient downstream task adaptation for apple picking and image-goal navigation.
Competitive Self-Play Enables Emergent Physical Skills in Simulated Agents
OpenAI demonstrates that competitive self-play allows simulated agents to spontaneously develop complex physical skills—tackling, ducking, faking, kicking, catching, and diving—without explicit environment design for those behaviors. The self-play dynamic automatically calibrates difficulty to the agent's current skill level. Combined with concurrent Dota 2 self-play results, OpenAI expresses confidence that self-play will be a foundational component of powerful AI systems.

