technique
output-centric safety training
techniqueactive
output-centric-safety-training-521d7ff1·1 events·first seen 28d agoAliases: output-centric safety training
Co-occurring entities
More like this (12)
Recent events (1)
From hard refusals to safe-completions: toward output-centric safety training
OpenAI introduces a 'safe-completions' approach in GPT-5 that replaces hard refusals with nuanced, output-centric safety training for handling dual-use prompts. Rather than refusing requests outright, the model is trained to produce responses that are both helpful and safe by shaping the content of outputs. This represents a methodological shift in how safety and helpfulness are balanced during training, moving away from binary refusal behavior toward graduated response strategies.