Almanac
paper

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

paperactiveprovisionaloperadic-consistency-a-label-free-signal-for-compositional-reasoning-failures-in-llms-62f15a5f·1 events·first seen 5d ago

Aliases: Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·5d ago·source ↗

Operadic consistency: a label-free signal for detecting compositional reasoning failures in LLMs

Researchers introduce operadic consistency (OC), a label-free inference-time signal that checks whether an LLM's direct answer to a compositional query agrees with the answer produced by composing its own stated decomposition of that query. Evaluated across 12 instruction-tuned LLMs (4B–671B parameters) on four multi-hop QA datasets, OC achieves Pearson r ∈ [0.86, 0.94] with accuracy uniformly across all datasets, outperforming self-consistency, semantic entropy, and P(True) in cross-dataset robustness. At the per-question level, OC provides information beyond existing baselines and yields selective-prediction improvements (AUARC lifts +0.086–0.096, AUROC lifts +0.092–0.164) at equal sampling cost, with results extending to frontier thinking models using chain-of-thought decompositions.