Entity · technique

Truncated OPD (TOPD)

techniqueactivetruncated-opd-topd--d9fd831e·1 events·first seen Jun 1, 2026

Aliases: Truncated OPD (TOPD)

Co-occurring entities

On-Policy Distillation (OPD)mathematical reasoning Progressive OPD (POPD)Reinforcement Learning with Verifiable Rewards

More like this (12)

FastOPD Progressive OPD (POPD)d-OPSD β-OPSD DOPD X³-OPD On-Policy Distillation (OPD)Vision-OPD OPSD DPPO OPT DanceOPD

Recent events (1)

6arXiv · cs.CL·Jun 1, 2026·source ↗

Are Full Rollouts Necessary for On-Policy Distillation?

This paper investigates whether full rollouts are required during on-policy distillation (OPD) for training reasoning models, identifying rollout horizon as a key computational bottleneck. The authors propose two strategies: Progressive OPD (POPD), which gradually expands rollout horizon during training, and Truncated OPD (TOPD), which uses permanently truncated rollouts. Experiments on mathematical reasoning show POPD achieves up to 3× training efficiency improvement, while TOPD matches full OPD performance using only 10% of the rollout horizon, yielding significant wall-clock and memory savings.

Training Infrastructure Frontier Model Releases On-Policy Distillation (OPD)mathematical reasoning Truncated OPD (TOPD)+4 more