technique
KL-Cov regularization
techniqueactive
kl-cov-regularization-30336f5d·1 events·first seen 25d agoAliases: KL-Cov regularization
Co-occurring entities
More like this (12)
KL-regularized RLL0 regularizationKL DivergenceEntropy RegularizationR-Drop consistency regularizationreverse KL divergenceEntropy-Regularized Reinforcement LearningELBO variance minimizationPC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-TrainingTailLoR: Protecting Principal Components in Parameter-Efficient Continual LearningDivergence Regularized Policy OptimizationGLM-OCR
Recent events (1)
Two is better than one: A Collapse-free Multi-Reward RLIF Training Framework
This paper proposes a multi-reward reinforcement learning from internal feedback (RLIF) framework that decomposes training signals into an answer-level reward via cluster voting and a completion-level reward via token-wise self-certainty. To address reward hacking and entropy collapse common in single-reward RLIF, the authors introduce GDPO-based normalization and KL-Cov regularization targeting low-entropy token distributions. Evaluated on mathematical reasoning and code-generation benchmarks, the method achieves stability and performance approaching supervised RLVR methods without requiring external ground-truth supervision. The work advances scalable unsupervised RL training for LLM reasoning.