benchmark

CharacterEval

benchmarkactiveprovisionalcharactereval-22e19fc2·1 events·first seen 2d ago

Aliases: CharacterEval

Co-occurring entities

GRPO Psy-CoT Role-Aware Policy Optimization CoSER CharacterBench

More like this (12)

ValueEval ParaEval Every Eval Ever T-Eval TweetEval HumanEval ProActEval G-Eval SummEval IndicContextEval UniEval Codex HumanEval

Recent events (1)

4arXiv · cs.CL·2d ago·source ↗

Psy-CoT and RAPO: Psychology-grounded reasoning and role-aware RL for character-faithful role-playing agents

Researchers propose Psy-CoT, a chain-of-thought framework that decomposes role-playing reasoning into three psychology-grounded steps (Interaction Perception, Psychological Empathy, Logical Construction) to improve out-of-distribution generalization beyond surface mimicry. They also introduce Role-Aware Policy Optimization (RAPO), a reinforcement learning method that uses profile–token mutual information to weight gradients asymmetrically, addressing reward hacking where generic phrases receive the same signal as role-specific ones. Experiments on CoSER, CharacterBench, and CharacterEval show Psy-CoT outperforms existing role-playing CoT methods and RAPO consistently beats GRPO across model scales. The work addresses a known failure mode of SFT-based role-playing agents and proposes a targeted RL fix for reward model exploitation.

Agent and Tool Ecosystem Alignment and RLHF GRPO CharacterEval Psy-CoT +3 more