technique
Process Reward Model
techniqueactive
process-reward-model-6c172dc9·1 events·first seen 1mo agoAliases: Process Reward Model
Co-occurring entities
More like this (12)
Recent events (1)
Qwen2.5-Math Process Reward Model for Mathematical Reasoning Supervision
Alibaba's Qwen team introduces a process reward model (PRM) aimed at improving the reliability of mathematical reasoning in LLMs by supervising intermediate reasoning steps rather than only final answers. The work addresses the problem of models producing plausible but flawed intermediate derivations even when reaching correct conclusions. The release includes model weights on HuggingFace and ModelScope alongside a GitHub repository.