technique
N-GRPO
techniqueactiveprovisional
n-grpo-426da1d7·1 events·first seen 7d agoAliases: N-GRPO
Co-occurring entities
More like this (12)
Recent events (1)
N-GRPO: Semantic Neighbor Mixing for Improved Policy Optimization in LLM Reasoning
A new arXiv preprint introduces N-GRPO, an exploration strategy for the GRPO reinforcement learning framework that improves solution diversity during rollout by mixing embeddings of anchor tokens with their nearest semantic neighbors rather than using token-level sampling or random noise. The method is evaluated on DeepSeek-R1-Distill-Qwen models of various sizes and shows consistent improvements on math reasoning benchmarks plus out-of-distribution generalization. The work targets a known limitation in RLHF-style training: redundant rollout trajectories that reduce effective learning signal.