Entity · technique

TV loss

techniqueactivetv-loss-00ea6401·1 events·first seen Jun 11, 2026

Aliases: TV loss

Co-occurring entities

Multi-Token Prediction (MTP)speculative decoding Bebop Qwen3

More like this (12)

TVL/HCT CKTN Radio Télévision Luxembourg Channel Corp HTV-Agent CNN Cox Media Group τ-Voice TimesFM FedTSV τ³-Telecom Video Arena

Recent events (1)

6arXiv · cs.LG·Jun 11, 2026·source ↗

Bebop: MTP with rejection sampling and TV loss achieves 1.8x RL training speedup

Researchers introduce Bebop, a framework for integrating Multi-Token Prediction (MTP) into large-scale RL training pipelines for LLMs. The work identifies that MTP acceptance rates degrade during RL due to entropy fluctuations, and proposes probabilistic rejection sampling plus a novel end-to-end Total Variation (TV) loss that directly optimizes multi-step acceptance rates, achieving up to 95% acceptance rates and 25% extra inference throughput gains. Applied to Qwen3.5, Qwen3.6, and Qwen3.7 models, the method yields up to 1.8x end-to-end acceleration in async RL training. The approach eliminates the need for costly online MTP updating by using pre-RL MTP training with the proposed objectives.

Training Infrastructure Inference Economics Multi-Token Prediction (MTP)speculative decoding TV loss +3 more