paper
From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning
paperactiveprovisional
from-correctness-to-utility-gain-based-prefix-evaluation-for-llm-reasoning-9b206b83·1 events·first seen 9d agoAliases: From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning
Co-occurring entities
More like this (12)
Operads for compositional reasoning in LLMsAgentic Chain-of-Thought Steering for Efficient and Controllable LLM ReasoningScaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight VerifierOperadic consistency: a label-free signal for compositional reasoning failures in LLMsJanus: A Benchmark for Goal-Conditioned Information Distortion in LLMsQuantifying Faithful Confidence Expression in Large Reasoning ModelsReasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday ReasoningDoes Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning ModelsWhich Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMsReasoning EnhancementPrefix Utility ModelBackdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs
Recent events (1)
Prefix Utility Model (PUM) trains process reward models on outcome-grounded prefix gain rather than step correctness
A new arXiv preprint proposes replacing local step-correctness signals in process reward models with 'prefix gain' — the improvement in solve-rate induced by conditioning a student model on a given reasoning prefix. The authors train a Prefix Utility Model (PUM) using a pairwise ranking objective and evaluate it across Best-of-N selection, beam search, and RL on mathematical reasoning tasks. PUM shows particular strength when candidate pools are large, search budgets are high, or rule-based rewards are sparse. Code, data, and models are released publicly.