technique
KL-regularized RL
techniqueactiveprovisional
kl-regularized-rl-6685d9b1·1 events·first seen 16d agoAliases: KL-regularized RL
Co-occurring entities
More like this (12)
Entropy-Regularized Reinforcement LearningExpRL: Exploratory RL for LLM Mid-TrainingKL-Cov regularizationExpRLCompetitive Programming RLConstrained Reinforcement LearningRecursive Language Models (RLMs)RL²RLIF (Reinforcement Learning from Internal Feedback)TRL (Transformer Reinforcement Learning)L0 regularizationHierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Recent events (1)
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization
DRIFT is a training framework that bridges online RL and offline SFT for multi-turn LLM optimization by exploiting the theoretical equivalence between KL-regularized RL and importance-weighted supervised learning. It decouples rollout generation from policy optimization: trajectories are sampled from a fixed reference policy offline, weighted by return-based importance scores, and used for weighted SFT. Empirically, DRIFT matches or exceeds multi-turn RL baselines while retaining the efficiency and simplicity of standard supervised fine-tuning. Code is publicly released.