Almanac
benchmark

Patient Health Questionnaire-9

benchmarkactiveprovisionalpatient-health-questionnaire-9-00536954·1 events·first seen 5h ago

Aliases: Patient Health Questionnaire-9

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·5h ago·source ↗

Fine-tuning LLMs to passively estimate depression severity from AI mental health conversations

Researchers fine-tune a Qwen3.5-27B model with a regression head to predict PHQ-9 depression severity scores directly from AI mental health app conversation transcripts, eliminating the need for explicit self-report completion. The training set of 6,283 users combines 3,111 ground-truth labels with pseudolabels generated by Claude Opus and iterative intermediate models. On a held-out test of 842 users, the best model achieves MAE=2.6, Pearson r=0.80, and AUC=0.91 at the clinical PHQ-9≥10 threshold, with AUC>0.87 across all severity thresholds. The work demonstrates a passive, continuous symptom-monitoring approach that could reduce response bias in mental health platforms.