Entity · product

Skill-RM

productactiveskill-rm-806b8684·1 events·first seen Jun 3, 2026

Aliases: Skill-RM

Co-occurring entities

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Alibaba Qwen

More like this (12)

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Skill Self-Play SkillOpt CompSkillBench meta-skill Skill Self-Play: Pushing the Frontier of LLM Capability with Co-Evolving Skills SKILL.md SkillWeaver agent-skills skills G-RRM RMM

Recent events (1)

6arXiv · cs.LG·Jun 3, 2026·source ↗

Skill-RM: A unified reward model framework treating evaluation as an agentic skill

Researchers from the Qwen team propose Skill-RM, a framework that reformulates reward modeling as the execution of a reusable 'Reward-Evaluation Skill,' enabling a single model to orchestrate heterogeneous evaluation criteria including rule-based verifiers, ground-truth references, and rubrics. By treating reward computation as a structured agentic task, Skill-RM dynamically selects and aggregates evidence per input rather than relying on static evaluation. Experiments on reward benchmarks and downstream tasks (best-of-N selection, RL) show consistent improvements over traditional judge baselines. The code is publicly released under the Qwen-Applications GitHub organization.

Evaluation and Benchmarking Agent and Tool Ecosystem Skill-RM Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Alibaba +2 more