Entity · technique

Prefix Utility Model

techniqueactiveprefix-utility-model-8fcea6e2·1 events·first seen Jun 8, 2026

Aliases: Prefix Utility Model

Co-occurring entities

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

More like this (12)

Attainable Utility Preservation Universal Dependencies foundation models From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Unified Harm Framework Universal Commerce Protocol Counterfactual Trajectory Utility Fine-Tuning for Financial Utility topic modelling Bridge Evidence: Static Retrieval Utility Does Not Predict Causal Utility in Multi-Step Agentic Search VPT Model Meta Model API

Recent events (1)

6arXiv · cs.CL·Jun 8, 2026·source ↗

Prefix Utility Model (PUM) trains process reward models on outcome-grounded prefix gain rather than step correctness

A new arXiv preprint proposes replacing local step-correctness signals in process reward models with 'prefix gain' — the improvement in solve-rate induced by conditioning a student model on a given reasoning prefix. The authors train a Prefix Utility Model (PUM) using a pairwise ranking objective and evaluate it across Best-of-N selection, beam search, and RL on mathematical reasoning tasks. PUM shows particular strength when candidate pools are large, search budgets are high, or rule-based rewards are sparse. Code, data, and models are released publicly.

Evaluation and Benchmarking Alignment and RLHF From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Prefix Utility Model