6OpenAI Blog·1mo ago

Learning to play Minecraft with Video PreTraining (VPT)

OpenAI trained a neural network to play Minecraft using Video PreTraining (VPT) on a large unlabeled video dataset of human gameplay, supplemented by a small amount of labeled contractor data. The model operates via native human interface inputs (keypresses and mouse movements) rather than game APIs. After fine-tuning, it can craft diamond tools—a task requiring over 20 minutes and ~24,000 actions for skilled humans. The work is framed as a step toward general computer-using agents.

Frontier Model Releases Agent and Tool Ecosystem Alignment and RLHF VPT Model Video PreTraining (VPT)OpenAI Minecraft

Related guides (4)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Alignment and RLHFTopic guide

Alignment and RLHF: From Human Feedback to Scalable Post-Training

Read asIn-depth

Related events (8)

5Openai Blog·1mo ago·source ↗

Learning Montezuma's Revenge from a Single Demonstration

OpenAI trained a reinforcement learning agent to achieve a score of 74,500 on Montezuma's Revenge using a single human demonstration, surpassing all previously published results. The method is straightforward: the agent plays episodes starting from carefully selected states drawn from the demonstration, optimizing game score via PPO. This approach demonstrates that imitation-seeded curriculum learning can dramatically improve exploration in hard-exploration environments. The same PPO algorithm underpins OpenAI Five.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI Five PPO OpenAI +1 more

6Openai Blog·1mo ago·source ↗

More on Dota 2: OpenAI Self-Play Reaches Superhuman Performance

OpenAI reports that a self-play reinforcement learning system progressed from below high-ranked human level to beating top professional Dota 2 players within one month, using only 1v1 mid-lane play. The post highlights self-play as a mechanism that automatically improves training data quality as the agent improves, contrasting it with supervised learning's dependence on fixed datasets. The result is presented as evidence that sufficient compute combined with self-play can rapidly close and exceed human-level performance gaps.

Evaluation and Benchmarking Agent and Tool Ecosystem self-play OpenAI Five Dota 2 +2 more

6arXiv · cs.CL·3d ago·source ↗

OmniAgent: POMDP-based active perception agent for long video understanding with test-time scaling

Researchers introduce OmniAgent, a multimodal agent that reformulates long video understanding as a POMDP-based iterative Observation-Thought-Action cycle, selectively distilling audio-visual cues into persistent textual memory rather than processing all frames uniformly. The system uses Agentic Supervised Fine-Tuning and a novel reinforcement learning method (TAURA) with turn-level entropy for credit assignment. OmniAgent demonstrates positive test-time scaling and achieves state-of-the-art open-source results across ten benchmarks, with its 7B model outperforming Qwen2.5-VL-72B on LVBench (50.5% vs. 47.3%).

Inference Economics Agent and Tool Ecosystem OmniAgent Qwen2.5-VL-72B LVBench +4 more

3Openai Blog·1mo ago·source ↗

OpenAI Co-Organizes Procgen and MineRL NeurIPS 2020 Competitions

OpenAI announced co-organization of two NeurIPS 2020 competitions with AIcrowd, Carnegie Mellon University, and DeepMind, centered on the Procgen Benchmark and MineRL environments. These competitions are aimed at advancing research in procedurally generated environments and sequential decision-making in Minecraft-like settings. The announcement is from June 2020 and represents a collaborative academic competition initiative.

Evaluation and Benchmarking NeurIPS 2020 Carnegie Mellon University DeepMind +4 more

6arXiv · cs.CL·20d ago·source ↗

PithTrain: A Compact and Agent-Native MoE Training System

PithTrain is a new MoE training framework designed around 'agent-native' principles, enabling AI coding agents to more efficiently understand, operate, and extend the framework. The authors introduce a new evaluation dimension called agent-task efficiency (ATE) and an accompanying benchmark ATE-Bench to measure the cost of using coding agents on training-framework tasks. PithTrain matches the throughput of production frameworks while achieving up to 62% fewer Agent Turns and 64% less Active GPU Time on ATE-Bench compared to existing systems.

Training Infrastructure Frontier Model Releases ATE-Bench Mixture of Experts PithTrain +3 more

3Openai Blog·1mo ago·source ↗

GamePad: A Learning Environment for Theorem Proving

OpenAI released GamePad, a learning environment designed to facilitate machine learning research on formal theorem proving. The tool provides an interface to the Coq proof assistant, enabling researchers to train models on proof states and tactics. This represents an early effort to apply ML techniques to automated mathematical reasoning and formal verification.

Evaluation and Benchmarking Agent and Tool Ecosystem Coq OpenAI GamePad

5Openai Blog·1mo ago·source ↗

Competitive Self-Play Enables Emergent Physical Skills in Simulated Agents

OpenAI demonstrates that competitive self-play allows simulated agents to spontaneously develop complex physical skills—tackling, ducking, faking, kicking, catching, and diving—without explicit environment design for those behaviors. The self-play dynamic automatically calibrates difficulty to the agent's current skill level. Combined with concurrent Dota 2 self-play results, OpenAI expresses confidence that self-play will be a foundational component of powerful AI systems.

Agent and Tool Ecosystem Alignment and RLHF Dota 2 Competitive Self-Play OpenAI

6Openai Blog·1mo ago·source ↗

OpenAI Five Defeats Amateur Human Teams at Dota 2

OpenAI announced that OpenAI Five, a team of five neural networks trained via self-play, has begun defeating amateur human teams at Dota 2. This represented an early milestone in applying reinforcement learning to complex, long-horizon multi-agent environments. The system was trained using large-scale distributed RL, demonstrating that neural networks could coordinate in real-time strategy games without hand-crafted rules.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI Five Dota 2 Proximal Policy Optimization +1 more