4OpenAI Blog·1mo ago

Learning Concepts with Energy Functions

OpenAI presents an energy-based model capable of learning abstract spatial concepts—such as 'near,' 'above,' and 'between'—from only five demonstrations using sets of 2D points. The model generalizes across domains, transferring concepts learned in a 2D particle environment to control tasks in a 3D physics-based robot simulation. The work demonstrates few-shot concept acquisition and cross-domain transfer via energy function representations.

Agent and Tool Ecosystem Energy-Based Models few-shot learning OpenAI Cross-Domain Transfer

Related guides (2)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4Openai Blog·1mo ago·source ↗

Implicit Generation and Generalization Methods for Energy-Based Models

OpenAI published research on stable and scalable training of energy-based models (EBMs), achieving sample quality competitive with GANs at low temperatures while retaining mode coverage guarantees of likelihood-based models. The approach uses iterative compute during generation to continually refine outputs. This work positions EBMs as a promising alternative generative modeling paradigm bridging GANs and likelihood-based models.

Frontier Model Releases Energy-Based Models OpenAI Generative Adversarial Networks

4Openai Blog·1mo ago·source ↗

OpenAI Develops Hierarchical Reinforcement Learning Algorithm for Long-Horizon Tasks

OpenAI published research on a hierarchical reinforcement learning (HRL) algorithm that learns reusable high-level actions to solve tasks requiring thousands of timesteps. Applied to navigation problems, the algorithm discovers locomotion primitives (walking, crawling in various directions) that enable rapid mastery of new tasks. The approach addresses a core challenge in RL: efficient exploration and transfer across long-horizon tasks.

Agent and Tool Ecosystem OpenAI Hierarchical Reinforcement Learning

6arXiv · cs.AI·26d ago·source ↗

Joint Energy-Based Models Reveal a Generative-Discriminative Sweet Spot for Human-Aligned Vision

Researchers use Joint Energy-Based Models (JEMs) to isolate the effect of learning objective—independent of architecture, scale, and data—on human alignment in visual representations. By varying a single mixing coefficient between discriminative and generative training, they evaluate models across six human-alignment benchmarks and find that alignment peaks at intermediate points on the generative-discriminative continuum rather than at either extreme. The results suggest that hybrid objectives combining categorical structure from discriminative learning with input-structure sensitivity from generative learning yield the most human-like visual behavior. This challenges the framing of generative vs. discriminative as a binary choice for building human-aligned vision systems.

Evaluation and Benchmarking Alignment and RLHF human alignment benchmarks (perceptual similarity, gloss, robustness, shape-texture)Joint Energy-Based Models (JEMs)generative-discriminative continuum

4Openai Blog·1mo ago·source ↗

Generalizing from Simulation: OpenAI Sim-to-Real Robotics Transfer

OpenAI published results on sim-to-real transfer for robot controllers, demonstrating that policies trained entirely in simulation can be deployed on physical robots and respond to unplanned environmental changes. The work represents a shift from open-loop to closed-loop control systems in robotics. This is a 2017 research milestone predating current frontier model work but relevant to the historical trajectory of OpenAI's robotics program.

Agent and Tool Ecosystem sim-to-real transfer closed-loop control OpenAI

5Openai Blog·1mo ago·source ↗

Learning Montezuma's Revenge from a Single Demonstration

OpenAI trained a reinforcement learning agent to achieve a score of 74,500 on Montezuma's Revenge using a single human demonstration, surpassing all previously published results. The method is straightforward: the agent plays episodes starting from carefully selected states drawn from the demonstration, optimizing game score via PPO. This approach demonstrates that imitation-seeded curriculum learning can dramatically improve exploration in hard-exploration environments. The same PPO algorithm underpins OpenAI Five.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI Five PPO OpenAI +1 more

5Openai Blog·1mo ago·source ↗

Multimodal neurons in artificial neural networks

OpenAI researchers discovered neurons in CLIP that respond to the same concept across literal, symbolic, and conceptual representations. This finding parallels multimodal neurons previously observed in biological brains and helps explain CLIP's ability to classify unusual visual renditions of concepts. The work is presented as a step toward understanding the associations and biases learned by CLIP and similar vision-language models.

AI Safety Research Multimodal Progress OpenAI multimodal neurons CLIP

6arXiv · cs.LG·29d ago·source ↗

Episodic Context and Persistent 3D World Models Enable Curiosity-Driven Exploration in Photorealistic Environments

This paper addresses the failure modes of curiosity-driven RL in complex 3D environments, where agents revisit forgotten states and get trapped in local loops due to lacking spatial persistence and episodic memory. The authors combine an online 3D reconstruction as a persistent world model with a sequence-model policy over RGB observations to maintain episodic trajectory context. Trained purely via intrinsic curiosity on HM3D, the agent outperforms RL-based active mapping baselines and zero-shot generalizes to Gibson and AI-generated environments. The approach also enables efficient downstream task adaptation for apple picking and image-goal navigation.

Evaluation and Benchmarking Agent and Tool Ecosystem online 3D reconstruction curiosity-driven reinforcement learning Remember to be Curious +3 more

5Openai Blog·1mo ago·source ↗

Competitive Self-Play Enables Emergent Physical Skills in Simulated Agents

OpenAI demonstrates that competitive self-play allows simulated agents to spontaneously develop complex physical skills—tackling, ducking, faking, kicking, catching, and diving—without explicit environment design for those behaviors. The self-play dynamic automatically calibrates difficulty to the agent's current skill level. Combined with concurrent Dota 2 self-play results, OpenAI expresses confidence that self-play will be a foundational component of powerful AI systems.

Agent and Tool Ecosystem Alignment and RLHF Dota 2 Competitive Self-Play OpenAI