technique
Turing-RL
techniqueactiveprovisional
turing-rl-98018ebd·1 events·first seen 2d agoAliases: Turing-RL
More like this (12)
Recent events (1)
Turing-RL: Reinforcement learning with Turing-Test-based rewards for user simulator training
Researchers propose Turing-RL, a method for training LLM-based user simulators using a discriminative reward signal that scores how indistinguishable generated responses are from real user responses, rather than matching a single ground-truth output. An LLM judge evaluates indistinguishability given the user's history, and the simulator is trained via RL to maximize this reward. Evaluated on conversational chat and Reddit forum discussion domains, Turing-RL outperforms log-probability and similarity-reward baselines on both LLM and human evaluation metrics. The work has implications for agent assistant training, personalization system evaluation, and social science research.