Almanac
technique

deliberative alignment

techniqueactivedeliberative-alignment-f9adbfb7·1 events·first seen 28d ago

Aliases: deliberative alignment

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

Deliberative Alignment: Reasoning Enables Safer Language Models

OpenAI introduces deliberative alignment, a new alignment strategy applied to o1 models in which the model is directly taught safety specifications and trained to reason over them at inference time. Unlike prior approaches that embed safety implicitly through RLHF, this method makes safety reasoning explicit and inspectable. The announcement positions deliberative alignment as a meaningful advance in scalable oversight and safe deployment of frontier reasoning models.