technique
safe-completions
techniqueactive
safe-completions-9779a072·1 events·first seen 28d agoAliases: safe-completions
Co-occurring entities
More like this (12)
Recent events (1)
From hard refusals to safe-completions: toward output-centric safety training
OpenAI introduces a 'safe-completions' approach in GPT-5 that replaces hard refusals with nuanced, output-centric safety training for handling dual-use prompts. Rather than refusing requests outright, the model is trained to produce responses that are both helpful and safe by shaping the content of outputs. This represents a methodological shift in how safety and helpfulness are balanced during training, moving away from binary refusal behavior toward graduated response strategies.