Entity · technique

suicide and self-harm classifier

techniqueactivesuicide-and-self-harm-classifier-f34aa82c·1 events·first seen Jun 1, 2026

Aliases: suicide and self-harm classifier

Co-occurring entities

claude.ai Claude Opus 4.6 Reinforcement Learning from Human Feedback Claude Sonnet 4.5 Claude Haiku 4.5 sycophancy International Association for Suicide Prevention ThroughLine Anthropic

More like this (12)

Analysing Self-Harm Representations in Language Models: a Cross-Architecture Study suicide meme content moderation Safety Detection Classifier International Association for Suicide Prevention AutoSkillHarm Self-Evolving Human-Centered Framework for Explainable Depression Symptom Annotation probing classifiers eating disorder safety evaluation A Multi-Agent System for Autonomous, Fine-Tuning-Free Clinical Symptom Detection: Development and Validation Study SkillHarm Self-Supervised Learning Neural Classification Trees

Recent events (1)

6Anthropic News·Jun 1, 2026·source ↗

Anthropic Details Safeguards for User Wellbeing: Crisis Detection, Anti-Sycophancy, and Evaluation Results

Anthropic has published a detailed account of its user wellbeing safeguards, covering how Claude handles suicide and self-harm conversations through model training, system prompts, and a real-time crisis classifier integrated with ThroughLine's global helpline network. The post discloses evaluation results for Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5, showing 98–99% appropriate response rates on high-risk single-turn prompts and very low false-refusal rates on benign requests. Anthropic also addresses anti-sycophancy efforts and an 18+ age requirement for Claude.ai. The company is partnering with the International Association for Suicide Prevention (IASP) to further inform training and product design.

Evaluation and Benchmarking AI Safety Research claude.ai Claude Opus 4.6 Reinforcement Learning from Human Feedback +9 more