technique

Turing-RL

techniqueactiveprovisionalturing-rl-98018ebd·1 events·first seen 2d ago

Aliases: Turing-RL

More like this (12)

ExpRL Recursive Language Models (RLMs)TRL (Transformer Reinforcement Learning)Competitive Programming RL ContextRL Turing completeness KL-regularized RL ReuseRL ExpRL: Exploratory RL for LLM Mid-Training PrefixRL MemRL SafeRL-Lab

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

Turing-RL: Reinforcement learning with Turing-Test-based rewards for user simulator training

Researchers propose Turing-RL, a method for training LLM-based user simulators using a discriminative reward signal that scores how indistinguishable generated responses are from real user responses, rather than matching a single ground-truth output. An LLM judge evaluates indistinguishability given the user's history, and the simulator is trained via RL to maximize this reward. Evaluated on conversational chat and Reddit forum discussion domains, Turing-RL outperforms log-probability and similarity-reward baselines on both LLM and human evaluation metrics. The work has implications for agent assistant training, personalization system evaluation, and social science research.

Evaluation and Benchmarking Agent and Tool Ecosystem Turing-RL