Almanac
technique

Process Reward Model

techniqueactiveprocess-reward-model-6c172dc9·1 events·first seen 1mo ago

Aliases: Process Reward Model

Co-occurring entities

More like this (12)

Recent events (1)

6Qwen Research·1mo ago·source ↗

Qwen2.5-Math Process Reward Model for Mathematical Reasoning Supervision

Alibaba's Qwen team introduces a process reward model (PRM) aimed at improving the reliability of mathematical reasoning in LLMs by supervising intermediate reasoning steps rather than only final answers. The work addresses the problem of models producing plausible but flawed intermediate derivations even when reaching correct conclusions. The release includes model weights on HuggingFace and ModelScope alongside a GitHub repository.