technique
Progressive OPD (POPD)
techniqueactiveprovisional
progressive-opd-popd--63deb7a0·1 events·first seen 16d agoAliases: Progressive OPD (POPD)
Co-occurring entities
More like this (12)
Recent events (1)
Are Full Rollouts Necessary for On-Policy Distillation?
This paper investigates whether full rollouts are required during on-policy distillation (OPD) for training reasoning models, identifying rollout horizon as a key computational bottleneck. The authors propose two strategies: Progressive OPD (POPD), which gradually expands rollout horizon during training, and Truncated OPD (TOPD), which uses permanently truncated rollouts. Experiments on mathematical reasoning show POPD achieves up to 3× training efficiency improvement, while TOPD matches full OPD performance using only 10% of the rollout horizon, yielding significant wall-clock and memory savings.