technique
deliberative alignment
techniqueactive
deliberative-alignment-f9adbfb7·1 events·first seen 28d agoAliases: deliberative alignment
Co-occurring entities
More like this (12)
Recent events (1)
Deliberative Alignment: Reasoning Enables Safer Language Models
OpenAI introduces deliberative alignment, a new alignment strategy applied to o1 models in which the model is directly taught safety specifications and trained to reason over them at inference time. Unlike prior approaches that embed safety implicitly through RLHF, this method makes safety reasoning explicit and inspectable. The announcement positions deliberative alignment as a meaningful advance in scalable oversight and safe deployment of frontier reasoning models.