Entity · technique

AdvGRPO

techniqueactiveadvgrpo-84697b44·1 events·first seen Jun 9, 2026

Aliases: AdvGRPO

Co-occurring entities

More like this (12)

GRPO N-GRPO Flow-GRPO GRPO (Group Relative Policy Optimization)Off-Context GRPO AdaPrefix-GRPO IH-GRPO Better Call GRPO Dr. GRPO Latent-Anchored GRPO GraphGPO GRU

Recent events (1)

5arXiv · cs.CL·Jun 9, 2026·source ↗

AdvGRPO: Stable co-training framework for adaptive red teaming of language models

Researchers introduce AdvGRPO, a co-training framework that makes GRPO viable for joint attacker-defender optimization in LLM red teaming, addressing previously reported instability. The method uses dense multi-channel rewards and decoupled advantage normalization, with a curriculum progressing from single-turn to multi-turn attacks before bootstrapping co-training. Co-trained defenders outperform baselines on safety benchmarks, and the attacks show transferability across models.

AI Safety Research Alignment and RLHF AdvGRPO GRPO PPO +1 more