Entity · technique

self-play

techniqueactiveself-play-4c2c7265·3 events·first seen May 20, 2026

Aliases: self-play

Co-occurring entities

Reinforcement Learning OpenAI SCOPE Qwen2.5 GRPO OLMo-3 Qwen3 OpenAI Five Dota 2

More like this (12)

self-play reinforcement learning Competitive Self-Play Skill Self-Play Self-Generated Replay Self-Instruct self-attention NPC-Playground self-driving cars Multi-Agent Fictitious Play RePlaid Hindsight Experience Replay Dark Experience Replay

Recent events (3)

7arXiv · cs.CL·Jun 1, 2026·source ↗

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE is a data-free self-play framework for training language models on open-ended tasks without external supervision or frontier-model judges. It co-evolves two policies—a Challenger that generates document-grounded tasks and a Solver that answers via multi-turn retrieval—using a frozen copy of the initial model as a self-judge that writes task-specific rubrics. Across three 7-8B models (Qwen2.5, Qwen3, OLMo-3), SCOPE achieves up to +10.4 points on eight open-ended benchmarks and +13.8 points on seven held-out short-form QA benchmarks, matching or exceeding GRPO trained on ~9K curated prompts. Ablations identify rubric generation quality as the primary bottleneck for self-judging.

Evaluation and Benchmarking Open Weights Progress SCOPE Qwen2.5 self-play +5 more

3Openai Blog·May 20, 2026·source ↗

Learning to Cooperate, Compete, and Communicate

OpenAI published early research on multiagent environments as a pathway toward AGI, arguing that competitive multi-agent settings provide a natural curriculum and continuous pressure for improvement. The post highlights two key properties: difficulty scales with competitor skill, and no stable equilibrium exists, ensuring perpetual learning pressure. The work positions multiagent environments as fundamentally different from single-agent RL and calls for significant further research.

Evaluation and Benchmarking Agent and Tool Ecosystem self-play Reinforcement Learning OpenAI

6Openai Blog·May 20, 2026·source ↗

More on Dota 2: OpenAI Self-Play Reaches Superhuman Performance

OpenAI reports that a self-play reinforcement learning system progressed from below high-ranked human level to beating top professional Dota 2 players within one month, using only 1v1 mid-lane play. The post highlights self-play as a mechanism that automatically improves training data quality as the agent improves, contrasting it with supervised learning's dependence on fixed datasets. The result is presented as evidence that sufficient compute combined with self-play can rapidly close and exceed human-level performance gaps.

Evaluation and Benchmarking Agent and Tool Ecosystem self-play OpenAI Five Dota 2 +2 more