Entity · technique

SCOPE

techniqueactivescope-7685546e·1 events·first seen Jun 1, 2026

Aliases: SCOPE

Co-occurring entities

Qwen2.5 self-play GRPO OLMo-3 Qwen3

More like this (12)

SCOPE-RL ModelScope SPEAR TimeScope PlanetScope Llama Scope AgentScope SPEX OPSD CoSER SPECTRA Gemma Scope 2

Recent events (1)

7arXiv · cs.CL·Jun 1, 2026·source ↗

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE is a data-free self-play framework for training language models on open-ended tasks without external supervision or frontier-model judges. It co-evolves two policies—a Challenger that generates document-grounded tasks and a Solver that answers via multi-turn retrieval—using a frozen copy of the initial model as a self-judge that writes task-specific rubrics. Across three 7-8B models (Qwen2.5, Qwen3, OLMo-3), SCOPE achieves up to +10.4 points on eight open-ended benchmarks and +13.8 points on seven held-out short-form QA benchmarks, matching or exceeding GRPO trained on ~9K curated prompts. Ablations identify rubric generation quality as the primary bottleneck for self-judging.

Evaluation and Benchmarking Open Weights Progress SCOPE Qwen2.5 self-play +5 more