technique
TV loss
techniqueactiveprovisional
tv-loss-00ea6401·1 events·first seen 6d agoAliases: TV loss
Co-occurring entities
More like this (12)
Recent events (1)
Bebop: MTP with rejection sampling and TV loss achieves 1.8x RL training speedup
Researchers introduce Bebop, a framework for integrating Multi-Token Prediction (MTP) into large-scale RL training pipelines for LLMs. The work identifies that MTP acceptance rates degrade during RL due to entropy fluctuations, and proposes probabilistic rejection sampling plus a novel end-to-end Total Variation (TV) loss that directly optimizes multi-step acceptance rates, achieving up to 95% acceptance rates and 25% extra inference throughput gains. Applied to Qwen3.5, Qwen3.6, and Qwen3.7 models, the method yields up to 1.8x end-to-end acceleration in async RL training. The approach eliminates the need for costly online MTP updating by using pre-RL MTP training with the proposed objectives.