Almanac
paper

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

paperactiveprovisionalskill-rm-unifying-heterogeneous-evaluation-criteria-via-agent-skill-c7d46330·1 events·first seen 14d ago

Aliases: Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·14d ago·source ↗

Skill-RM: A unified reward model framework treating evaluation as an agentic skill

Researchers from the Qwen team propose Skill-RM, a framework that reformulates reward modeling as the execution of a reusable 'Reward-Evaluation Skill,' enabling a single model to orchestrate heterogeneous evaluation criteria including rule-based verifiers, ground-truth references, and rubrics. By treating reward computation as a structured agentic task, Skill-RM dynamically selects and aggregates evidence per input rather than relying on static evaluation. Experiments on reward benchmarks and downstream tasks (best-of-N selection, RL) show consistent improvements over traditional judge baselines. The code is publicly released under the Qwen-Applications GitHub organization.