Entity · benchmark

Montezuma's Revenge

benchmarkactivemontezuma-s-revenge-114235bd·2 events·first seen May 20, 2026

Aliases: Montezuma's Revenge

Co-occurring entities

OpenAI OpenAI Five PPO Random Network Distillation Yuri Burda Harrison Edwards

More like this (12)

Zama Llama Guard 4 Minecraft Dark Experience Replay Flash Attention 2 Llama 2 Rocket Money Llama 3 Game Arena Mamba2 Sudoku-Extreme ShadowHand

Recent events (2)

5Openai Blog·May 20, 2026·source ↗

Learning Montezuma's Revenge from a Single Demonstration

OpenAI trained a reinforcement learning agent to achieve a score of 74,500 on Montezuma's Revenge using a single human demonstration, surpassing all previously published results. The method is straightforward: the agent plays episodes starting from carefully selected states drawn from the demonstration, optimizing game score via PPO. This approach demonstrates that imitation-seeded curriculum learning can dramatically improve exploration in hard-exploration environments. The same PPO algorithm underpins OpenAI Five.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI Five PPO OpenAI +1 more

6Openai Blog·May 20, 2026·source ↗

Reinforcement Learning with Prediction-Based Rewards (Random Network Distillation)

OpenAI introduces Random Network Distillation (RND), a curiosity-driven exploration method for reinforcement learning that uses prediction error on a fixed random neural network as an intrinsic reward signal. RND is the first method to exceed average human performance on Montezuma's Revenge, a notoriously hard-exploration Atari game. The approach is simple to implement and compatible with standard RL algorithms, offering a scalable alternative to count-based or dynamics-model exploration bonuses.

Evaluation and Benchmarking AI Safety Research OpenAI Random Network Distillation Yuri Burda +2 more