OpenAI Superalignment Fast Grants: $10M for Superhuman AI Safety Research
OpenAI is launching $10M in fast grants to fund external technical research on aligning and ensuring the safety of superhuman AI systems. Priority research areas include weak-to-strong generalization, interpretability, and scalable oversight. The program is part of OpenAI's broader Superalignment initiative, which aims to solve the alignment problem for superintelligent systems within four years.
Related guides (3)
Related events (8)
Advancing independent research on AI alignment
OpenAI is committing $7.5 million to The Alignment Project to fund independent AI alignment research. The grant is framed as part of broader efforts to address AGI safety and security risks. This represents a notable external funding move by OpenAI to support alignment work outside its own walls.
Weak-to-Strong Generalization: OpenAI's New Superalignment Research Direction
OpenAI presents a new research direction for superalignment exploring whether weak supervisors can effectively control much stronger AI models by leveraging deep learning's generalization properties. The work addresses a core challenge in scalable oversight: as AI systems surpass human-level capabilities, human supervisors may be unable to reliably evaluate or correct model outputs. Initial results are described as promising, suggesting that weak-to-strong generalization may be a viable path toward aligning superhuman AI systems.
Announcing the OpenAI Safety Fellowship
OpenAI has announced a Safety Fellowship, described as a pilot program aimed at supporting independent safety and alignment research while developing the next generation of AI safety talent. The announcement is sparse on details but signals a structured investment in external safety research capacity. This follows broader industry trends of labs funding independent safety work to build the research ecosystem.
Google DeepMind launches $10M funding call for multi-agent AI safety research
Google DeepMind and unnamed partners have announced a $10M funding call targeting safety research for multi-agent AI systems. The initiative signals institutional recognition that multi-agent architectures present distinct safety challenges requiring dedicated research investment. This is a notable funding commitment from a tier-1 lab directed specifically at an underexplored safety domain.
OpenAI Awards Up to $2M in Grants for AI and Mental Health Research
OpenAI is launching a grant program of up to $2 million to fund research at the intersection of AI and mental health. The program targets studies examining real-world risks, benefits, and applications of AI with the goal of improving safety and well-being. No specific grantees or research directions are named in the announcement.
Our approach to alignment research
OpenAI outlines its alignment research strategy, centered on improving AI systems' ability to learn from human feedback and to assist humans in evaluating AI outputs. The stated long-term goal is to build a sufficiently aligned AI system capable of helping solve remaining alignment problems. This represents OpenAI's public framing of its scalable oversight and RLHF-centric research agenda as of mid-2022.
OpenAI Launches $1M Grant Program for Democratic AI Governance Experiments
OpenAI's nonprofit arm is awarding ten grants of $100,000 each to fund experiments in designing democratic processes for determining the rules AI systems should follow. The program aims to explore how collective human input can shape AI behavior within legal boundaries. This represents an early institutional effort to operationalize participatory governance for AI alignment decisions.
Anthropic launches initiative to fund third-party AI safety evaluations
Anthropic announced a funded initiative to source third-party evaluations measuring advanced AI capabilities and safety risks, with priority areas including cybersecurity, CBRN threats, model autonomy, national security risks, social manipulation, and misalignment. The initiative is tied to Anthropic's Responsible Scaling Policy and AI Safety Level (ASL) framework, aiming to address a gap between demand and supply of high-quality safety-relevant evals. Proposals are solicited via an application form, with Anthropic framing the effort as benefiting the broader AI safety ecosystem rather than just internal use.


