Entity · benchmark

RubricsTree

benchmarkactiverubricstree-81cd981d·1 events·first seen Jun 17, 2026

Aliases: RubricsTree

Co-occurring entities

More like this (12)

Rubrics on Trial Rubric Reward DeepRubric Rubric-based Feedback Evaluation EvalTree DominoTree Preference-Aware Rubric Learning When Rubrics Change: Cross-Rubric Generalization for Critical Thinking Essay Scoring rubric-based rewards DDTree TreeSim GraphReview

Recent events (1)

6arXiv · cs.CL·Jun 17, 2026·source ↗

RubricsTree: Scalable hierarchical rubric framework for evaluating personal health AI agents

RubricsTree is a new evaluation framework for LLM-powered personal health agents, built around a hierarchical taxonomy of over 100 clinically-verifiable Boolean rubrics derived from 4,000 real user queries and curated with physician oversight. A context-aware router activates only relevant rubrics per query, enabling scalable yet expert-aligned evaluation. The framework outperforms strong LLM-as-a-judge baselines on expert alignment and, when used as training signal, yields up to ~66% relative gains on HealthBench across Gemini, GPT, and Qwen model families. The work addresses a concrete bottleneck in clinical deployment of health AI: the cost-quality tradeoff in evaluation.

Evaluation and Benchmarking AI Safety Research HealthBench RubricsTree Qwen +2 more