paper
Online Safety Monitoring for LLMs
paperactiveprovisional
online-safety-monitoring-for-llms-19fe32b3·1 events·first seen 12h agoAliases: Online Safety Monitoring for LLMs
More like this (12)
LLM-as-monitorSecurity and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs RespondLLM Safety LeaderboardWhat Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cuesopen-source LLMsClinically Grounded Privacy Evaluation of Medical LMsLLMScanelectronic warfare LLMfrontier LLMsMulti-Agentic System Leveraging Open-Source LLMs to Mitigate Disinformation ThreatsBeyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias Research
Recent events (1)
Online Safety Monitoring for LLMs via Threshold-Based Risk Control
A new arXiv preprint proposes a real-time safety monitor for LLMs that converts an external verifier signal into an alarm by thresholding, with the threshold calibrated via risk control. The authors evaluate the approach on mathematical reasoning and red-teaming datasets, finding it competitive with more complex sequential hypothesis testing monitors. The work addresses the practical deployment problem of detecting unsafe outputs after alignment training.