Anthropic Releases Responsible Scaling Policy Version 3.0
Anthropic has published the third version of its Responsible Scaling Policy (RSP), a voluntary framework for mitigating catastrophic risks from increasingly capable AI systems. The update reflects two-plus years of experience with the original RSP, reinforcing what worked (ASL-3 safeguards activated in May 2025, industry adoption by OpenAI and Google DeepMind, informing early AI policy) while addressing shortcomings in accountability and transparency. The new version refines the AI Safety Level (ASL) framework and introduces new measures for decision-making transparency. Anthropic acknowledges that some elements of its original theory of change—particularly multilateral coordination and government action at higher capability thresholds—have not fully materialized as hoped.
Related guides (4)
Related events (8)
Anthropic publishes Responsible Scaling Policy with AI Safety Level framework
Anthropic released its Responsible Scaling Policy (RSP), a formal framework of technical and organizational protocols for managing catastrophic risks from increasingly capable AI systems. The policy introduces AI Safety Levels (ASL-1 through ASL-5+), modeled on US biosafety level standards, requiring progressively stricter safety, security, and operational standards as models become more capable. Current Claude models are classified as ASL-2; ASL-3 triggers stricter deployment constraints including adversarial red-teaming requirements. The policy has been approved by Anthropic's board and is intended as a template for industry-wide adoption.
Anthropic publishes major update to Responsible Scaling Policy with new capability thresholds and ASL standards
Anthropic released a significant revision to its Responsible Scaling Policy (RSP), its risk governance framework for managing catastrophic risks from frontier AI. The update introduces two explicit capability thresholds—autonomous AI R&D and CBRN weapons uplift—that trigger mandatory upgrades to AI Safety Level (ASL) standards, with current models operating under ASL-2. New elements include safety-case-inspired documentation processes, internal governance stress-testing, and external expert input mechanisms, drawing on risk management practices from high-consequence industries like biosafety.
Anthropic reflects on Responsible Scaling Policy implementation and previews updated framework
Anthropic published a retrospective on operationalizing its Responsible Scaling Policy (RSP), originally released in summer 2023, sharing lessons learned and announcing an updated RSP is forthcoming. The post outlines five high-level commitments: establishing Red Line Capabilities, conducting Frontier Risk Evaluations, responding to Red Line Capabilities via an ASL-3 Standard, iteratively extending the policy toward ASL-4, and implementing Assurance Mechanisms. Key reflections include the difficulty of anticipating emergent capabilities in future models, expert disagreement on CBRN risk prioritization, and the value of quantitative threat modeling. Anthropic signals intent to move from voluntary commitments toward industry best practices and eventual regulation.
Dario Amodei's AI Safety Summit remarks detail Anthropic's Responsible Scaling Policy and ASL framework
Dario Amodei delivered prepared remarks at the UK AI Safety Summit (November 2023) explaining Anthropic's Responsible Scaling Policy (RSP), which was the first such policy published by a major AI lab. The RSP introduces AI Safety Levels (ASL-1 through ASL-4), modeled on biosafety level frameworks, with capability thresholds triggering mandatory safeguards before further training or deployment. Key implementation lessons include deep executive involvement, integrating RSP requirements into product roadmaps, and formal accountability through Anthropic's board and Long Term Benefit Trust. The remarks outline specific ASL-3 requirements around CBRN misuse prevention and security, and preview ASL-4 criteria involving near-human autonomy or becoming a primary source of global security threats.
Anthropic advocates for third-party testing regime as core AI policy infrastructure
Anthropic published a policy position paper arguing that frontier AI systems require a third-party testing and oversight regime, distinct from self-governance approaches like their own Responsible Scaling Policy. The post outlines what such a regime should include: trusted third-party auditors, precisely scoped tests targeting only the most computationally intensive systems, and international coordination via shared standards and Mutual Recognition agreements. Anthropic acknowledges their RSP is insufficient alone because it relies on single private-sector actors, and calls for industry-wide mandatory testing that would eventually become a legal requirement for wide deployment.
Anthropic publishes structured harm assessment framework covering physical, psychological, economic, and societal impacts
Anthropic has released a policy document describing their evolving framework for assessing and mitigating AI harms across five dimensions: physical, psychological, economic, societal, and individual autonomy impacts. The framework complements their existing Responsible Scaling Policy and informs decisions on usage policies, red-teaming, detection, and enforcement. Concrete examples include safeguards for computer use capabilities (fraud, phishing) and a reported 45% reduction in unnecessary refusals in Claude 3.7 Sonnet through improved handling of ambiguous prompts. Anthropic frames this as a work-in-progress and invites collaboration from the broader AI ecosystem.
Anthropic launches initiative to fund third-party AI safety evaluations
Anthropic announced a funded initiative to source third-party evaluations measuring advanced AI capabilities and safety risks, with priority areas including cybersecurity, CBRN threats, model autonomy, national security risks, social manipulation, and misalignment. The initiative is tied to Anthropic's Responsible Scaling Policy and AI Safety Level (ASL) framework, aiming to address a gap between demand and supply of high-quality safety-relevant evals. Proposals are solicited via an application form, with Anthropic framing the effort as benefiting the broader AI safety ecosystem rather than just internal use.
Anthropic publishes policy brief calling for targeted AI regulation within 18 months
Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.



