4OpenAI Blog·1mo ago

One-shot imitation learning

OpenAI published research on one-shot imitation learning, a technique enabling agents to learn new tasks from a single demonstration. The approach allows a policy network to observe a demonstration and immediately generalize to new instances of the same task without additional training. This was an early contribution to the field of meta-learning and few-shot generalization in robotics and sequential decision-making.

Agent and Tool Ecosystem Alignment and RLHF One-Shot Imitation Learning meta-learning Imitation Learning OpenAI

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Alignment and RLHFTopic guide

Alignment and RLHF: Teaching AI Models to Behave

Read asBeginner In-depth

Related events (8)

5Openai Blog·1mo ago·source ↗

Learning Montezuma's Revenge from a Single Demonstration

OpenAI trained a reinforcement learning agent to achieve a score of 74,500 on Montezuma's Revenge using a single human demonstration, surpassing all previously published results. The method is straightforward: the agent plays episodes starting from carefully selected states drawn from the demonstration, optimizing game score via PPO. This approach demonstrates that imitation-seeded curriculum learning can dramatically improve exploration in hard-exploration environments. The same PPO algorithm underpins OpenAI Five.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI Five PPO OpenAI +1 more

4arXiv · cs.LG·3d ago·source ↗

Imitation learning technique infers red agent policy in partially observable cyber-defense environments

Researchers propose a Policy Learning Technique using imitation learning to infer attacker (red agent) policies from network observations and defender actions in partially observable autonomous cyber environments. The method integrates with neurosymbolic cyber-defense agents that use behavior trees with learning-enabled components. Evaluated across diverse simulated scenarios, the approach achieves high prediction accuracy for red agent actions, improving the defender's ability to anticipate intrusions.

AI Safety Research Agent and Tool Ecosystem Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents behavior trees with learning-enabled components

4Openai Blog·1mo ago·source ↗

Generalizing from Simulation: OpenAI Sim-to-Real Robotics Transfer

OpenAI published results on sim-to-real transfer for robot controllers, demonstrating that policies trained entirely in simulation can be deployed on physical robots and respond to unplanned environmental changes. The work represents a shift from open-loop to closed-loop control systems in robotics. This is a 2017 research milestone predating current frontier model work but relevant to the historical trajectory of OpenAI's robotics program.

Agent and Tool Ecosystem sim-to-real transfer closed-loop control OpenAI

5Openai Blog·1mo ago·source ↗

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning

OpenAI published RL², a meta-reinforcement learning approach in which a slow outer RL process trains a recurrent neural network whose hidden state encodes a fast inner learning algorithm. The method allows agents to rapidly adapt to new tasks within a single episode by leveraging experience accumulated across many training tasks. This work is an early foundational contribution to meta-learning for RL, predating the modern agent and LLM era but relevant to understanding the intellectual lineage of in-context and few-shot learning in AI systems.

Agent and Tool Ecosystem Alignment and RLHF Recurrent Neural Network Reinforcement Learning OpenAI +1 more

5Openai Blog·1mo ago·source ↗

On First-Order Meta-Learning Algorithms

OpenAI published research on first-order meta-learning algorithms, presenting simplified variants of MAML (Model-Agnostic Meta-Learning) that omit second-order derivatives while retaining competitive performance. The work demonstrates that first-order approximations are surprisingly effective for few-shot learning tasks. This contributed to the broader understanding of gradient-based meta-learning efficiency and scalability.

Alignment and RLHF First-Order MAML OpenAI MAML (Model-Agnostic Meta-Learning)+1 more

5arXiv · cs.AI·11d ago·source ↗

DARP: Semi-parametric retrieval-based imitation learning reduces compounding errors by 15-46%

Researchers introduce DARP (Difference-Aware Retrieval Policies), a semi-parametric imitation learning method that retrieves k-nearest neighbor demonstrations at inference time and predicts actions based on relative distance vectors between neighbor and query states. The approach reparameterizes behavior cloning around local neighborhood structure rather than global state-to-action mappings, requiring no additional data collection or online expert feedback. Across continuous control and robotic manipulation tasks, DARP shows 15-46% performance improvements over standard behavior cloning, including on high-dimensional visual inputs.

Agent and Tool Ecosystem DARP Difference-Aware Retrieval Policies for Imitation Learning

7Openai Blog·1mo ago·source ↗

Learning from Human Preferences: OpenAI and DeepMind Collaborate on Reward Learning from Comparisons

OpenAI, in collaboration with DeepMind's safety team, published a method for learning reward functions directly from human preference comparisons between pairs of agent behaviors, eliminating the need to hand-code goal functions. The algorithm infers human intent by asking evaluators which of two proposed behaviors is preferable, addressing risks from misspecified reward functions. This work is an early foundational contribution to what would become reinforcement learning from human feedback (RLHF). It targets both safety and alignment concerns around reward hacking and proxy gaming.

Evaluation and Benchmarking AI Safety Research Reward Learning from Comparisons DeepMind Reinforcement Learning from Human Feedback +2 more

10Openai Blog·1mo ago·source ↗

Language models are few-shot learners

OpenAI published the GPT-3 paper introducing a 175-billion-parameter autoregressive language model demonstrating strong few-shot learning capabilities across a wide range of NLP tasks. The work showed that scaling language models dramatically improves task-agnostic, few-shot performance, often matching or exceeding fine-tuned models without any gradient updates. This paper became a foundational milestone in the development of large language models and the modern AI landscape.

Long Context Evolution Frontier Model Releases GPT-3 Language Models are Few-Shot Learners few-shot learning +4 more