Almanac
technique

output-centric safety training

techniqueactiveoutput-centric-safety-training-521d7ff1·1 events·first seen 28d ago

Aliases: output-centric safety training

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

From hard refusals to safe-completions: toward output-centric safety training

OpenAI introduces a 'safe-completions' approach in GPT-5 that replaces hard refusals with nuanced, output-centric safety training for handling dual-use prompts. Rather than refusing requests outright, the model is trained to produce responses that are both helpful and safe by shaping the content of outputs. This represents a methodological shift in how safety and helpfulness are balanced during training, moving away from binary refusal behavior toward graduated response strategies.