Almanac
benchmark

Triadic Werewolf

benchmarkactiveprovisionaltriadic-werewolf-be52697e·1 events·first seen 15h ago

Aliases: Triadic Werewolf

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·15h ago·source ↗

Triadic Werewolf benchmark exposes multi-hop Theory of Mind failures in LLMs

Researchers introduce a Werewolf game variant with a Jester faction whose inverted utility function (winning by being voted out) requires models to reason across three opposing incentive structures simultaneously. Across 60 games, GPT-4.1, DeepSeek-V3.1, and Llama-3.3-70B all struggle: Werewolves never exceed 20% win rate and GPT-4.1 wolves vote out the Jester in 60-70% of games, a self-defeating action. Only DeepSeek-V3.1 learns the nuanced strategy of appearing suspicious without appearing intentionally suspicious, and benefits most from self-learning. The work argues dyadic social-deduction benchmarks systematically underestimate the difficulty of multi-agent Theory of Mind.