Almanac
technique

Rule-Based Rewards

techniqueactiverule-based-rewards-4ff25ef0·1 events·first seen 28d ago

Aliases: Rule-Based Rewards

Co-occurring entities

More like this (12)

Recent events (1)

6Openai Blog·28d ago·source ↗

Improving Model Safety Behavior with Rule-Based Rewards

OpenAI has developed a method called Rule-Based Rewards (RBRs) that trains models to behave safely without requiring extensive human data collection. The approach uses explicit rules to generate reward signals during training, offering a more scalable alternative to traditional RLHF-based safety alignment. This represents a practical contribution to alignment methodology from a Tier 1 lab.