Almanac
paper

DeepRubric

paperactiveprovisionaldeeprubric-998f055e·1 events·first seen 35h ago

Aliases: DeepRubric, DeepRubric-8B

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·35h ago·source ↗

DeepRubric: Evidence-tree rubric supervision cuts RL training cost for deep research agents by 13x

DeepRubric is a data construction framework that improves reinforcement learning efficiency for deep research agents by reversing the typical rubric-generation process: rather than inferring evaluation criteria from a query, it builds an evidence tree of verifiable sub-questions first, then synthesizes aligned query-rubric pairs. The authors construct 9K training examples and train DeepRubric-8B using rubric-based GRPO, achieving comparable performance to prior open-source state-of-the-art deep research models on three benchmarks while using roughly 13x fewer RL GPU-hours. The work addresses a key bottleneck in RL-based training of long-form research agents: unreliable reward signals from incomplete rubrics.