Entity · benchmark

ArenaHard

benchmarkactivearenahard-ef4f469c·1 events·first seen Jun 3, 2026

Aliases: ArenaHard

Co-occurring entities

More like this (12)

Arena-Hard Arena Code Arena AI Arena Search EvoArena Game Arena WebArena SWE-Bench-Pro-Hard-AA Vision Arena BashArena Video Arena OmniGameArena

Recent events (1)

6arXiv · cs.CL·Jun 3, 2026·source ↗

QUBRIC: Co-designing queries and rubrics for RL beyond verifiable rewards

QUBRIC is a framework that jointly optimizes queries and rubrics for reinforcement learning in settings where rewards are not strictly verifiable. The approach uses teacher-derived key points to rewrite open-ended queries into evaluable scenarios, applies contrastive rubric generation to capture teacher-policy gaps, and filters for learnability before GRPO training. Trained only on instruction-following data, QUBRIC achieves a +5.5 point gain on ArenaHard over an SFT baseline and transfers to legal, moral, and narrative reasoning benchmarks (+6.3 points average), suggesting rubric-based RL can complement RLVR in non-verifiable domains.

Evaluation and Benchmarking Alignment and RLHF QUBRIC GRPO ArenaHard