Almanac
technique

rule induction

techniqueactiveprovisionalrule-induction-b12a64bf·1 events·first seen 15d ago

Aliases: rule induction

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·15d ago·source ↗

HERO'S JOURNEY: A Benchmark for Complex Rule Induction in Text-Based Goal-Directed Tasks

HERO'S JOURNEY is a new benchmark evaluating rule induction capabilities of LLMs across eight tasks spanning attribute and procedural induction families, each with four structural rule forms and controllable lexical grounding. Agents must infer hidden rules from demonstrations and execute multi-step plans accordingly. Evaluation of state-of-the-art LLMs reveals limited and uneven rule induction ability, with process execution creating a bottleneck and surface semantics having minimal effect. Induction-specific steering methods improve attribute tasks but fail to reliably help procedural tasks, leaving procedural induction as an open challenge.