other
text games
otheractiveprovisional
text-games-8f7ddaa1·1 events·first seen 15d agoAliases: text games
Co-occurring entities
More like this (12)
Recent events (1)
HERO'S JOURNEY: A Benchmark for Complex Rule Induction in Text-Based Goal-Directed Tasks
HERO'S JOURNEY is a new benchmark evaluating rule induction capabilities of LLMs across eight tasks spanning attribute and procedural induction families, each with four structural rule forms and controllable lexical grounding. Agents must infer hidden rules from demonstrations and execute multi-step plans accordingly. Evaluation of state-of-the-art LLMs reveals limited and uneven rule induction ability, with process execution creating a bottleneck and surface semantics having minimal effect. Induction-specific steering methods improve attribute tasks but fail to reliably help procedural tasks, leaving procedural induction as an open challenge.