technique
Prefix Utility Model
techniqueactiveprovisional
prefix-utility-model-8fcea6e2·1 events·first seen 9d agoAliases: Prefix Utility Model
Co-occurring entities
More like this (12)
Attainable Utility PreservationUniversal Dependenciesfoundation modelsFrom Correctness to Utility: Gain-Based Prefix Evaluation for LLM ReasoningUnified Harm FrameworkUniversal Commerce ProtocolFine-Tuning for Financial Utilitytopic modellingVPT ModelUnified Multimodal Models (UMMs)One Useful ThingLocal Modality Substitution
Recent events (1)
Prefix Utility Model (PUM) trains process reward models on outcome-grounded prefix gain rather than step correctness
A new arXiv preprint proposes replacing local step-correctness signals in process reward models with 'prefix gain' — the improvement in solve-rate induced by conditioning a student model on a given reasoning prefix. The authors train a Prefix Utility Model (PUM) using a pairwise ranking objective and evaluate it across Best-of-N selection, beam search, and RL on mathematical reasoning tasks. PUM shows particular strength when candidate pools are large, search budgets are high, or rule-based rewards are sparse. Code, data, and models are released publicly.