AI Safety Needs Social Scientists
OpenAI published a paper arguing that long-term AI safety research requires social scientists to address uncertainties in human psychology, rationality, emotion, and biases that affect alignment algorithms. The paper contends that aligning advanced AI with human values cannot be solved by machine learning alone. OpenAI announced plans to hire social scientists full-time to work on these problems.
Related guides (3)
Related events (8)
Concrete Problems in AI Safety
OpenAI, Google Brain, Berkeley, and Stanford researchers co-authored 'Concrete Problems in AI Safety,' a foundational paper exploring research challenges in ensuring modern ML systems operate as intended. The paper identifies and frames specific technical safety problems for the field. Published in June 2016, it became a landmark reference for AI safety research agendas.
Helping people when they need it most: OpenAI's approach to mental health safety
OpenAI published a blog post outlining its philosophy and practices around safety for users experiencing mental or emotional distress. The post addresses the limitations of current AI systems in these contexts and describes ongoing work to improve them. This represents a public articulation of OpenAI's safety posture for sensitive use cases involving vulnerable users.
OpenAI Policy Paper: Four Strategies for Industry Cooperation on AI Safety
OpenAI published a policy research paper identifying four strategies to foster long-term industry cooperation on AI safety norms: communicating risks and benefits, technical collaboration, increased transparency, and incentivizing standards. The paper argues that competitive pressures risk creating a collective action problem where AI companies under-invest in safety. The analysis frames industry-wide coordination as essential to ensuring AI systems are safe and beneficial.
Preparing for malicious uses of AI
OpenAI co-authored a multi-institutional paper forecasting how malicious actors could misuse AI technology, produced in collaboration with FHI, CSER, CNAS, EFF, and others over nearly a year. The paper outlines potential threat vectors and proposes prevention and mitigation strategies. This represents an early coordinated effort among AI safety and policy organizations to systematically address AI misuse risks.
OpenAI and Anthropic Share Findings from Joint Safety Evaluation
OpenAI and Anthropic conducted a first-of-its-kind cross-lab safety evaluation, testing each other's frontier models across dimensions including misalignment, instruction following, hallucinations, and jailbreaking resistance. The collaboration represents a novel form of inter-lab safety research cooperation. Findings highlight both progress and ongoing challenges in AI safety, and establish a potential template for future cross-organizational evaluations.
Our approach to AI safety
OpenAI published a high-level overview of its approach to AI safety, framing safe development and deployment as central to its mission. The post appears to be a brief, top-level statement rather than a detailed technical or policy document. It signals OpenAI's public positioning on safety at a time of growing regulatory and public scrutiny.
Anthropic launches initiative to fund third-party AI safety evaluations
Anthropic announced a funded initiative to source third-party evaluations measuring advanced AI capabilities and safety risks, with priority areas including cybersecurity, CBRN threats, model autonomy, national security risks, social manipulation, and misalignment. The initiative is tied to Anthropic's Responsible Scaling Policy and AI Safety Level (ASL) framework, aiming to address a gap between demand and supply of high-quality safety-relevant evals. Proposals are solicited via an application form, with Anthropic framing the effort as benefiting the broader AI safety ecosystem rather than just internal use.
Our approach to alignment research
OpenAI outlines its alignment research strategy, centered on improving AI systems' ability to learn from human feedback and to assist humans in evaluating AI outputs. The stated long-term goal is to build a sufficiently aligned AI system capable of helping solve remaining alignment problems. This represents OpenAI's public framing of its scalable oversight and RLHF-centric research agenda as of mid-2022.


