benchmark
ArenaHard
benchmarkactiveprovisional
arenahard-ef4f469c·1 events·first seen 13d agoAliases: ArenaHard
Co-occurring entities
More like this (12)
Recent events (1)
QUBRIC: Co-designing queries and rubrics for RL beyond verifiable rewards
QUBRIC is a framework that jointly optimizes queries and rubrics for reinforcement learning in settings where rewards are not strictly verifiable. The approach uses teacher-derived key points to rewrite open-ended queries into evaluable scenarios, applies contrastive rubric generation to capture teacher-policy gaps, and filters for learnability before GRPO training. Trained only on instruction-following data, QUBRIC achieves a +5.5 point gain on ArenaHard over an SFT baseline and transfers to legal, moral, and narrative reasoning benchmarks (+6.3 points average), suggesting rubric-based RL can complement RLVR in non-verifiable domains.