Entity · paper

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

paperactivefrom-correctness-to-utility-gain-based-prefix-evaluation-for-llm-reasoning-9b206b83·1 events·first seen Jun 8, 2026

Aliases: From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

Co-occurring entities

Prefix Utility Model

More like this (12)

Logical Judgments Under Pressure: Diagnosing Syllogistic Stability with Learned Soft Prefixes Groc-PO: Grounded Context Preference Optimization for Truthful Multimodal LLMs Operads for compositional reasoning in LLMs Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning Two Axes of LLM Abstention: Answer Correctness and Question Answerability Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier When are likely answers right? On Sequence Probability and Correctness in LLMs Win by Silence: Deletion Non-Monotonicity, Autonomous Exploitation, and Typed-State Gating in LLM Plan Evaluation Knowledge Knows, Verbalization Tells: Disentangling Latent Directions for Mathematical Solvability in LLMs Can LLMs Judge Better Than They Generate? Evaluating Task Asymmetry, Mechanistic Interpretability and Transferability for In-Context QA CheckRLM: Effective Knowledge-Thought Coherence Checking in Retrieval-Augmented Reasoning Modality-Informed Reciprocal Reasoning Optimization

Recent events (1)

6arXiv · cs.CL·Jun 8, 2026·source ↗

Prefix Utility Model (PUM) trains process reward models on outcome-grounded prefix gain rather than step correctness

A new arXiv preprint proposes replacing local step-correctness signals in process reward models with 'prefix gain' — the improvement in solve-rate induced by conditioning a student model on a given reasoning prefix. The authors train a Prefix Utility Model (PUM) using a pairwise ranking objective and evaluate it across Best-of-N selection, beam search, and RL on mathematical reasoning tasks. PUM shows particular strength when candidate pools are large, search budgets are high, or rule-based rewards are sparse. Code, data, and models are released publicly.

Evaluation and Benchmarking Alignment and RLHF From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Prefix Utility Model