Entity · paper

QUBRIC

paperactivequbric-76495fed·1 events·first seen Jun 3, 2026

Aliases: QUBRIC

Co-occurring entities

GRPO ArenaHard

More like this (12)

PQuAD SQuAD RULER QA-2 MedQADE CUAD EQ-Bench MQuAKE SQA3D VQA-RAD StrategyQA IQL FreshQA

Recent events (1)

6arXiv · cs.CL·Jun 3, 2026·source ↗

QUBRIC: Co-designing queries and rubrics for RL beyond verifiable rewards

QUBRIC is a framework that jointly optimizes queries and rubrics for reinforcement learning in settings where rewards are not strictly verifiable. The approach uses teacher-derived key points to rewrite open-ended queries into evaluable scenarios, applies contrastive rubric generation to capture teacher-policy gaps, and filters for learnability before GRPO training. Trained only on instruction-following data, QUBRIC achieves a +5.5 point gain on ArenaHard over an SFT baseline and transfers to legal, moral, and narrative reasoning benchmarks (+6.3 points average), suggesting rubric-based RL can complement RLVR in non-verifiable domains.

Evaluation and Benchmarking Alignment and RLHF QUBRIC GRPO ArenaHard