Entity · technique

N-GRPO

techniqueactiven-grpo-426da1d7·1 events·first seen Jun 10, 2026

Aliases: N-GRPO

Co-occurring entities

DeepSeek-R1-Distill-Qwen Semantic Neighbor Mixing GRPO (Group Relative Policy Optimization)

More like this (12)

GRPO AdvGRPO Flow-GRPO IH-GRPO Off-Context GRPO Dr. GRPO GRPO (Group Relative Policy Optimization)Latent-Anchored GRPO Better Call GRPO GraphGPO AdaPrefix-GRPO GSPO

Recent events (1)

4arXiv · cs.CL·Jun 10, 2026·source ↗

N-GRPO: Semantic Neighbor Mixing for Improved Policy Optimization in LLM Reasoning

A new arXiv preprint introduces N-GRPO, an exploration strategy for the GRPO reinforcement learning framework that improves solution diversity during rollout by mixing embeddings of anchor tokens with their nearest semantic neighbors rather than using token-level sampling or random noise. The method is evaluated on DeepSeek-R1-Distill-Qwen models of various sizes and shows consistent improvements on math reasoning benchmarks plus out-of-distribution generalization. The work targets a known limitation in RLHF-style training: redundant rollout trajectories that reduce effective learning signal.

Evaluation and Benchmarking Alignment and RLHF N-GRPO DeepSeek-R1-Distill-Qwen Semantic Neighbor Mixing +1 more