6OpenAI Blog·1mo ago

OpenAI Superalignment Fast Grants: $10M for Superhuman AI Safety Research

OpenAI is launching $10M in fast grants to fund external technical research on aligning and ensuring the safety of superhuman AI systems. Priority research areas include weak-to-strong generalization, interpretability, and scalable oversight. The program is part of OpenAI's broader Superalignment initiative, which aims to solve the alignment problem for superintelligent systems within four years.

Evaluation and Benchmarking AI Safety Research Alignment and RLHF Superalignment interpretability OpenAI Superalignment Fast Grants weak-to-strong generalization scalable oversight

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Alignment and RLHFTopic guide

Alignment and RLHF: Teaching AI Models to Behave

Read asBeginner In-depth

Related events (8)

5Openai Blog·1mo ago·source ↗

Advancing independent research on AI alignment

OpenAI is committing $7.5 million to The Alignment Project to fund independent AI alignment research. The grant is framed as part of broader efforts to address AGI safety and security risks. This represents a notable external funding move by OpenAI to support alignment work outside its own walls.

AI Safety Research Alignment and RLHF The Alignment Project OpenAI

8Openai Blog·1mo ago·source ↗

Weak-to-Strong Generalization: OpenAI's New Superalignment Research Direction

OpenAI presents a new research direction for superalignment exploring whether weak supervisors can effectively control much stronger AI models by leveraging deep learning's generalization properties. The work addresses a core challenge in scalable oversight: as AI systems surpass human-level capabilities, human supervisors may be unable to reliably evaluate or correct model outputs. Initial results are described as promising, suggesting that weak-to-strong generalization may be a viable path toward aligning superhuman AI systems.

Evaluation and Benchmarking AI Safety Research Superalignment OpenAI weak-to-strong generalization +2 more

5Openai Blog·1mo ago·source ↗

Announcing the OpenAI Safety Fellowship

OpenAI has announced a Safety Fellowship, described as a pilot program aimed at supporting independent safety and alignment research while developing the next generation of AI safety talent. The announcement is sparse on details but signals a structured investment in external safety research capacity. This follows broader industry trends of labs funding independent safety work to build the research ecosystem.

AI Safety Research Alignment and RLHF OpenAI Safety Fellowship AI alignment OpenAI

6Google Deepmind Blog·10d ago·source ↗

Google DeepMind launches $10M funding call for multi-agent AI safety research

Google DeepMind and unnamed partners have announced a $10M funding call targeting safety research for multi-agent AI systems. The initiative signals institutional recognition that multi-agent architectures present distinct safety challenges requiring dedicated research investment. This is a notable funding commitment from a tier-1 lab directed specifically at an underexplored safety domain.

AI Safety Research Agent and Tool Ecosystem Google DeepMind

4Openai Blog·1mo ago·source ↗

OpenAI Awards Up to $2M in Grants for AI and Mental Health Research

OpenAI is launching a grant program of up to $2 million to fund research at the intersection of AI and mental health. The program targets studies examining real-world risks, benefits, and applications of AI with the goal of improving safety and well-being. No specific grantees or research directions are named in the announcement.

AI Safety Research OpenAI

5Openai Blog·1mo ago·source ↗

Our approach to alignment research

OpenAI outlines its alignment research strategy, centered on improving AI systems' ability to learn from human feedback and to assist humans in evaluating AI outputs. The stated long-term goal is to build a sufficiently aligned AI system capable of helping solve remaining alignment problems. This represents OpenAI's public framing of its scalable oversight and RLHF-centric research agenda as of mid-2022.

Evaluation and Benchmarking AI Safety Research Reinforcement Learning from Human Feedback OpenAI scalable oversight +1 more

5Openai Blog·1mo ago·source ↗

OpenAI Launches $1M Grant Program for Democratic AI Governance Experiments

OpenAI's nonprofit arm is awarding ten grants of $100,000 each to fund experiments in designing democratic processes for determining the rules AI systems should follow. The program aims to explore how collective human input can shape AI behavior within legal boundaries. This represents an early institutional effort to operationalize participatory governance for AI alignment decisions.

AI Safety Research Regulatory Developments OpenAI, Inc.Democratic Inputs to AI OpenAI +1 more

7Anthropic News·17d ago·source ↗

Anthropic launches initiative to fund third-party AI safety evaluations

Anthropic announced a funded initiative to source third-party evaluations measuring advanced AI capabilities and safety risks, with priority areas including cybersecurity, CBRN threats, model autonomy, national security risks, social manipulation, and misalignment. The initiative is tied to Anthropic's Responsible Scaling Policy and AI Safety Level (ASL) framework, aiming to address a gap between demand and supply of high-quality safety-relevant evals. Proposals are solicited via an application form, with Anthropic framing the effort as benefiting the broader AI safety ecosystem rather than just internal use.

Evaluation and Benchmarking AI Safety Research METR Google-Proof Q&A Responsible Scaling Policy +1 more