benchmark
TherapeuticsBench
benchmarkactiveprovisional
therapeuticsbench-fe17f017·1 events·first seen 2d agoAliases: TherapeuticsBench
Co-occurring entities
More like this (12)
Recent events (1)
TxBench-PP: New benchmark reveals AI agents struggle with preclinical pharmacology decisions
Researchers introduce TxBench-PP (TherapeuticsBench Preclinical Pharmacology), a 100-evaluation benchmark testing AI agents on realistic small-molecule drug discovery tasks including mechanism-of-action reasoning, compound-target engagement, and translational efficacy. Agents receive real workflow snapshots and are graded deterministically on structured answers. Across 16 model-harness configurations and 4,800 trajectories, no system reliably succeeded; the best performer, Claude Opus 4.8 with the Pi harness, passed only 59.3% of endpoint attempts. The results suggest current frontier models are not yet deployment-ready for autonomous preclinical pharmacology decision-making.