technique
Reinforcement Learning for Language Models
techniqueactive
reinforcement-learning-for-language-models-89ca3f53·1 events·first seen 1mo agoAliases: Reinforcement Learning for Language Models
Co-occurring entities
More like this (12)
unsupervised language modelinggenerative language modelingReinforcement Learning Elicits Contextual Learning of Unseen Language Translationlarge language model agentsMultimodal Large Language Modelsmulti-turn language modelsmRNA Language ModelLarge Language Models (frontier)large language modelsScaling Laws for Neural Language ModelsLanguage Models are Few-Shot LearnersReasoning Language Models
Recent events (1)
GSPO: Group Sequence Policy Optimization for Scalable RL Training of Language Models
Qwen researchers introduce Group Sequence Policy Optimization (GSPO), a new RL algorithm designed to address severe training instability and model collapse observed in existing methods like GRPO during extended training runs. The core motivation is enabling stable RL scaling for language models to improve reasoning and problem-solving capabilities with increased compute. The paper targets a known bottleneck in post-training pipelines where instability prevents further performance gains.