Entity · paper

Reward Modeling for Multi-Agent Orchestration

paperactivereward-modeling-for-multi-agent-orchestration-181bc679·1 events·first seen Jun 12, 2026

Aliases: Reward Modeling for Multi-Agent Orchestration

Co-occurring entities

More like this (12)

Preference Coordinated Multi-agent Policy Optimization Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play Dynamic Verifiable Multi-Agent Human Agentic Loyalty Loop Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch multi-turn agent benchmarks multi-agent cooperative framework Improving LLM-Generated Process Model Quality Through Reinforcement Learning: The Role of Reward Function Design Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution Institutional Red-Teaming: Deployment Rules, Not Just Models, Causally Shape Multi-Agent AI Safety Multi-Agent Fictitious Play Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill multi-level agent evaluation

Recent events (1)

6arXiv · cs.CL·Jun 12, 2026·source ↗

OrchRM: Self-supervised reward modeling for multi-agent orchestration without human annotations

Researchers propose Orchestration Reward Modeling (OrchRM), a self-supervised framework that trains reward models for LLM-based multi-agent orchestrators using intermediate execution artifacts to construct win-lose pairs for Bradley-Terry training. The approach avoids costly sub-agent rollouts by operating directly at the orchestration level, achieving up to 10x improvement in training token efficiency and up to 8% accuracy gains in test-time scaling. Results generalize across mathematical reasoning, web-based QA, and multi-hop reasoning tasks.

Agent and Tool Ecosystem Alignment and RLHF Reward Modeling for Multi-Agent Orchestration OrchRM Bradley-Terry +1 more