benchmark
RubricsTree
benchmarkactiveprovisional
rubricstree-81cd981d·1 events·first seen 3h agoAliases: RubricsTree
Co-occurring entities
More like this (12)
Recent events (1)
RubricsTree: Scalable hierarchical rubric framework for evaluating personal health AI agents
RubricsTree is a new evaluation framework for LLM-powered personal health agents, built around a hierarchical taxonomy of over 100 clinically-verifiable Boolean rubrics derived from 4,000 real user queries and curated with physician oversight. A context-aware router activates only relevant rubrics per query, enabling scalable yet expert-aligned evaluation. The framework outperforms strong LLM-as-a-judge baselines on expert alignment and, when used as training signal, yields up to ~66% relative gains on HealthBench across Gemini, GPT, and Qwen model families. The work addresses a concrete bottleneck in clinical deployment of health AI: the cost-quality tradeoff in evaluation.