Entity · paper

PsychoSafe

paperactivepsychosafe-b3cb16f2·1 events·first seen Jun 9, 2026

Aliases: PsychoSafe

Co-occurring entities

More like this (12)

TrajSafe PsyBridge Psy-CoT SafeCoder Behavioral-SafetyBench eating disorder safety evaluation SafeCtrl-RL SecureBio SafetyKit SPDNet SPyCE Safety Usage Dashboard

Recent events (1)

5arXiv · cs.CL·Jun 9, 2026·source ↗

PsychoSafe: Framework for Psychologically-Informed LLM Refusals in High-Risk Interactions

Researchers introduce PsychoSafe, a refusal framework that reframes LLM non-compliance as structured supportive communication grounded in evidence-based psychological intervention strategies. The work constructs an 8,019 prompt-response corpus across five risk domains and applies prompting and parameter-efficient fine-tuning to Qwen 3.5 27B, achieving 28.1% improvement in refusal quality over a generic baseline with notable gains in resource referral and psychological grounding. Evaluations on SORRY-Bench and XSTest reveal strong in-domain robustness but limited out-of-domain generalization, pointing to a need for more diverse fine-tuning data. The framework is relevant to safety alignment work targeting crisis, coercion, and escalating-intent scenarios.

AI Safety Research Alignment and RLHF Qwen 3.5 27B SORRY-Bench XSTest +1 more