Collective Alignment: OpenAI Surveys 1,000+ People on Model Spec Defaults
OpenAI conducted a global survey of over 1,000 participants to gather public input on how AI should behave, comparing responses against its existing Model Spec. The initiative, called 'collective alignment,' aims to shape AI default behaviors to better reflect diverse human values. Results are being used to update or validate Model Spec guidelines. This represents a structured attempt to incorporate democratic input into alignment policy.
Related guides (4)
Related events (8)
Introducing the Model Spec
OpenAI published its Model Spec, a document outlining the intended values, behaviors, and decision-making principles for its AI models. The spec defines a hierarchy of priorities—safety, ethics, adherence to OpenAI's principles, and helpfulness—and is intended to guide how models should behave across a wide range of situations. This represents OpenAI's formal attempt to codify alignment goals and behavioral norms into a publicly accessible framework.
Sharing the latest Model Spec
OpenAI has published an updated version of its Model Spec, the document that defines the values, behaviors, and priorities intended to guide its AI models. The Model Spec serves as a foundational alignment artifact, specifying how models should balance helpfulness, safety, and adherence to OpenAI's guidelines. This release reflects ongoing work in operationalizing alignment principles into training targets and behavioral policies.
Inside our approach to the Model Spec
OpenAI published a blog post explaining the philosophy and structure behind its Model Spec, a public framework governing model behavior. The post addresses how the spec balances safety, user autonomy, and accountability as AI systems become more capable. This is a tier-1 source announcement touching on alignment and behavioral governance methodology.
How should AI systems behave, and who should decide?
OpenAI published a policy post clarifying how ChatGPT's behavior is shaped and governed, outlining plans to allow greater user customization of model behavior. The post also describes intentions to solicit broader public input into decision-making around AI system behavior. This represents an early public articulation of OpenAI's approach to behavioral governance and value alignment in deployed systems.
OpenAI publishes public policy agenda covering safety, youth protection, and global standards
OpenAI released a formal public policy agenda outlining its positions on AI safety, youth protection, workforce transition, and international standards. The document represents OpenAI's stated priorities for engaging with governments and regulators. As a tier-1 primary source from a leading frontier lab, it signals how OpenAI intends to shape AI governance discussions.
Our approach to alignment research
OpenAI outlines its alignment research strategy, centered on improving AI systems' ability to learn from human feedback and to assist humans in evaluating AI outputs. The stated long-term goal is to build a sufficiently aligned AI system capable of helping solve remaining alignment problems. This represents OpenAI's public framing of its scalable oversight and RLHF-centric research agenda as of mid-2022.
OpenAI and Anthropic Share Findings from Joint Safety Evaluation
OpenAI and Anthropic conducted a first-of-its-kind cross-lab safety evaluation, testing each other's frontier models across dimensions including misalignment, instruction following, hallucinations, and jailbreaking resistance. The collaboration represents a novel form of inter-lab safety research cooperation. Findings highlight both progress and ongoing challenges in AI safety, and establish a potential template for future cross-organizational evaluations.
AI Safety Needs Social Scientists
OpenAI published a paper arguing that long-term AI safety research requires social scientists to address uncertainties in human psychology, rationality, emotion, and biases that affect alignment algorithms. The paper contends that aligning advanced AI with human values cannot be solved by machine learning alone. OpenAI announced plans to hire social scientists full-time to work on these problems.



