Fine-tuning LLMs to passively estimate depression severity from AI mental health conversations
Researchers fine-tune a Qwen3.5-27B model with a regression head to predict PHQ-9 depression severity scores directly from AI mental health app conversation transcripts, eliminating the need for explicit self-report completion. The training set of 6,283 users combines 3,111 ground-truth labels with pseudolabels generated by Claude Opus and iterative intermediate models. On a held-out test of 842 users, the best model achieves MAE=2.6, Pearson r=0.80, and AUC=0.91 at the clinical PHQ-9≥10 threshold, with AUC>0.87 across all severity thresholds. The work demonstrates a passive, continuous symptom-monitoring approach that could reduce response bias in mental health platforms.
Related guides (3)
Related events (8)
LLMs predict dementia and depression severity from clinical interview transcripts in zero-shot and feature-extraction settings
Researchers evaluate three open-weights LLMs (Mistral 3.1, DeepHermes, Qwen3) for predicting dementia and depression severity from speech transcripts of 154 German-speaking patients in standardized clinical interviews. The study introduces a new observer-based Global Depression Scale (GDS-D) and tests both zero-shot prediction and LLM-based feature extraction for Support Vector Regression. Zero-shot performs well for depression (MAE 0.60), while structured feature extraction reduces dementia assessment error by up to 35%; pause-enriched automatic transcripts match human transcription quality, suggesting viable fully-automated screening pipelines.
LLUMI: Fine-Tuning Open-Source LLMs for Mental Health Writing Assistance Using Reddit Community Feedback
LLUMI is a two-component system (a generation model and an improvement model) designed to provide mental health writing assistance using smaller open-source LLMs hosted in privacy-preserving, on-premise environments. The system leverages Reddit community endorsement signals (upvotes/downvotes) to construct preference pairs for SFT and DPO training, then further aligns outputs via human evaluation across readability, empathy, connection, actionability, and safety dimensions. Results show LLUMI achieves performance comparable to proprietary GPT-based models on linguistic and human evaluations, suggesting community-derived preference signals can substitute for expensive expert labeling in sensitive domains.
Dep-LLM: Training-free depression diagnosis framework using structured multi-factor LLM reasoning
Dep-LLM is a training-free framework for automatic depression detection from clinical interviews that uses frozen foundation LLMs without fine-tuning. The system decomposes long clinical dialogues into five thematic factors via Chain-of-Thought analysis, applies token-level entropy-based confidence modulation, and integrates multi-factor signals for final diagnosis. Evaluated on DAIC-WOZ and E-DAIC datasets, it outperforms zero-shot baselines across 21 foundation LLMs and surpasses supervised domain-specific and commercial LLMs on multiple metrics.
Evidence-Augmented ML for Self-Harm Surveillance in Emergency Department Triage Notes
Researchers developed a three-stage pipeline combining traditional machine learning with LLM-based screening and evidence extraction to detect self-harm in Australian emergency department triage notes. The system achieved AUPRCs around 0.88 in both internal and external validation, and transferred to two external hospital sites without site-specific retraining. A notable capability is identifying the primary self-harm method with 95% accuracy, enabling more granular public health surveillance beyond binary classification.
OpenAI Improves ChatGPT Mental Health Responses with Expert Collaboration
OpenAI worked with over 170 mental health experts to enhance ChatGPT's handling of sensitive conversations involving distress. The update improves the model's ability to recognize emotional distress, respond with empathy, and direct users to real-world support resources. OpenAI reports a reduction in unsafe responses of up to 80% as a result of these changes.
Reverse Probing: Supervised Token-level Uncertainty Quantification for LLMs in Clinical Text
The paper introduces Reverse Probing, a novel uncertainty quantification framework designed specifically for clinical text summarization that estimates token-level uncertainty from pre-existing labeled summaries rather than sampling new outputs. It extracts uncertainty signals from four categories of internal model activations, treating text as a probe into the model's internal state. Evaluated on two expert-annotated clinical datasets, it outperforms eight adapted baselines on all metrics, achieving up to 4× higher AUPRC while reducing inference time and compute. Feature analysis identifies delta energy and neighborhood context as the most consistent predictors of uncertainty across models.
Foundation Model for Wearable Health Data Pretrained on 1 Trillion Minutes from 5 Million Participants
Researchers propose a large-scale foundation model for wearable health data, pretrained on over one trillion minutes of unlabeled sensor signals from five million participants. The model demonstrates systematic performance improvements across 35 health prediction tasks spanning cardiovascular, metabolic, sleep, and mental health domains, with joint scaling of model capacity and data volume. A 'classroom' of LLM agents autonomously searches downstream predictive head configurations, and the resulting embeddings are integrated into a Personal Health Agent validated by 1,860 clinician ratings. The work establishes label-efficient few-shot learning and generative capabilities for daily health metric estimation.
Qwen releases Qwen3.5-35B-A3B multimodal MoE model on Hugging Face
Qwen has released Qwen3.5-35B-A3B, a 35B-parameter mixture-of-experts image-text-to-text model with approximately 3B active parameters, published on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 2.8 million downloads and 1,400+ likes, it has seen substantial community uptake.


