Entity · technique

Bradley-Terry

techniqueactivebradley-terry-3b07d8c5·2 events·first seen Jun 12, 2026

Aliases: Bradley-Terry

Co-occurring entities

DITTO TuneJury Reward Modeling for Multi-Agent Orchestration OrchRM Wang-ML-Lab

More like this (12)

Bradley-Terry-Davidson Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies Traycer Richardson BERT-base BraTS21 BRANE Andrew Kelley Tony Lee AgentBoard Tom Brown Bottman

Recent events (2)

4arXiv · cs.AI·Jun 16, 2026·source ↗

TuneJury: Open pairwise reward model for text-to-music preference alignment

Researchers introduce TuneJury, an open-source instance-level pairwise reward model for text-to-music generation that predicts preference scores from text prompts and audio clips. The model is trained on publicly available human-preference labels spanning arena votes, crowdsourced comparisons, and expert ratings. A post-hoc anchor calibration method enables efficient adaptation to new generators without full retraining. The reward model drives gains across best-of-N selection, latent optimization, and expert-iteration post-training.

Alignment and RLHF Multimodal Progress DITTO Bradley-Terry TuneJury

6arXiv · cs.CL·Jun 12, 2026·source ↗

OrchRM: Self-supervised reward modeling for multi-agent orchestration without human annotations

Researchers propose Orchestration Reward Modeling (OrchRM), a self-supervised framework that trains reward models for LLM-based multi-agent orchestrators using intermediate execution artifacts to construct win-lose pairs for Bradley-Terry training. The approach avoids costly sub-agent rollouts by operating directly at the orchestration level, achieving up to 10x improvement in training token efficiency and up to 8% accuracy gains in test-time scaling. Results generalize across mathematical reasoning, web-based QA, and multi-hop reasoning tasks.

Agent and Tool Ecosystem Alignment and RLHF Reward Modeling for Multi-Agent Orchestration OrchRM Bradley-Terry +1 more