6Anthropic News·17d ago

Anthropic advocates for third-party testing regime as core AI policy infrastructure

Anthropic published a policy position paper arguing that frontier AI systems require a third-party testing and oversight regime, distinct from self-governance approaches like their own Responsible Scaling Policy. The post outlines what such a regime should include: trusted third-party auditors, precisely scoped tests targeting only the most computationally intensive systems, and international coordination via shared standards and Mutual Recognition agreements. Anthropic acknowledges their RSP is insufficient alone because it relies on single private-sector actors, and calls for industry-wide mandatory testing that would eventually become a legal requirement for wide deployment.

AI Safety Research Regulatory Developments ChatGPT Claude Responsible Scaling Policy Gemini Anthropic

Related guides (4)

Claude

Claude: Anthropic's AI Assistant Built for Safety and Scale

Read asBeginner In-depth

ChatGPT

ChatGPT: The AI Assistant That Changed How the World Talks to Computers

Read asBeginner In-depth

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner

AI Safety ResearchTopic guide

AI Safety Research: From Lab Evals to Geopolitical Flashpoint

Read asIn-depth

Related events (8)

7Anthropic News·16d ago·source ↗

Anthropic launches initiative to fund third-party AI safety evaluations

Anthropic announced a funded initiative to source third-party evaluations measuring advanced AI capabilities and safety risks, with priority areas including cybersecurity, CBRN threats, model autonomy, national security risks, social manipulation, and misalignment. The initiative is tied to Anthropic's Responsible Scaling Policy and AI Safety Level (ASL) framework, aiming to address a gap between demand and supply of high-quality safety-relevant evals. Proposals are solicited via an application form, with Anthropic framing the effort as benefiting the broader AI safety ecosystem rather than just internal use.

Evaluation and Benchmarking AI Safety Research METR Google-Proof Q&A Responsible Scaling Policy +1 more

5Anthropic News·16d ago·source ↗

Anthropic submits AI accountability recommendations to NTIA, covering evals, red teaming, and pre-registration

Anthropic submitted a formal response to the NTIA's Request for Comment on AI Accountability, outlining a multi-part policy framework for governing advanced AI systems. Key recommendations include increased government funding for evaluation research, mandatory disclosure of evaluation methods, pre-registration of large training runs with national governments, mandated external red teaming before model release, and antitrust guidance to enable industry safety collaboration. The submission reflects Anthropic's core policy positions and advocates for risk-tiered oversight proportional to model capabilities.

Evaluation and Benchmarking AI Safety Research National Institute of Standards and Technology National Telecommunications and Information Administration Anthropic +1 more

8Anthropic News·19d ago·source ↗

Anthropic Releases Responsible Scaling Policy Version 3.0

Anthropic has published the third version of its Responsible Scaling Policy (RSP), a voluntary framework for mitigating catastrophic risks from increasingly capable AI systems. The update reflects two-plus years of experience with the original RSP, reinforcing what worked (ASL-3 safeguards activated in May 2025, industry adoption by OpenAI and Google DeepMind, informing early AI policy) while addressing shortcomings in accountability and transparency. The new version refines the AI Safety Level (ASL) framework and introduces new measures for decision-making transparency. Anthropic acknowledges that some elements of its original theory of change—particularly multilateral coordination and government action at higher capability thresholds—have not fully materialized as hoped.

Frontier Model Releases Evaluation and Benchmarking RAISE Act EU AI Act California SB 53 +8 more

6Anthropic News·16d ago·source ↗

Anthropic publishes policy brief calling for targeted AI regulation within 18 months

Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.

AI Safety Research Regulatory Developments Anthropic Policy Frontier Red Team Claude 3.5 Sonnet UK AI Security Institute +5 more

8Anthropic News·17d ago·source ↗

Anthropic publishes Responsible Scaling Policy with AI Safety Level framework

Anthropic released its Responsible Scaling Policy (RSP), a formal framework of technical and organizational protocols for managing catastrophic risks from increasingly capable AI systems. The policy introduces AI Safety Levels (ASL-1 through ASL-5+), modeled on US biosafety level standards, requiring progressively stricter safety, security, and operational standards as models become more capable. Current Claude models are classified as ASL-2; ASL-3 triggers stricter deployment constraints including adversarial red-teaming requirements. The policy has been approved by Anthropic's board and is intended as a template for industry-wide adoption.

Frontier Model Releases AI Safety Research ARC Evals Claude Responsible Scaling Policy +3 more

5Anthropic News·17d ago·source ↗

Anthropic publishes frontier model security recommendations including multi-party authorization and secure development frameworks

Anthropic released a policy and technical guidance document outlining cybersecurity best practices for securing frontier AI models, including multi-party authorization to AI-critical infrastructure, adoption of NIST SSDF and SLSA supply chain standards, and public-private cooperation modeled on critical infrastructure sectors. The post argues that advanced AI models warrant security levels far exceeding standard commercial practices and recommends government procurement requirements as a near-term enforcement mechanism. Anthropic states it is actively implementing these controls internally and calls on other labs and governments to adopt similar frameworks.

AI Safety Research Regulatory Developments Supply Chain Levels for Software Artifacts National Institute of Standards and Technology NIST Secure Software Development Framework +1 more

6Anthropic News·19d ago·source ↗

Anthropic Responds to White House AI Action Plan, Calls for Transparency Standards and Export Controls

Anthropic published a policy response to the White House's 'Winning the Race: America's AI Action Plan,' endorsing its focus on AI infrastructure, federal adoption, and safety research while urging additional steps on export controls and mandatory AI development transparency standards. The company highlighted alignment between the plan and its prior OSTP submissions, and noted its proactive activation of ASL-3 protections with Claude Opus 4 as evidence that safety and innovation are compatible. Anthropic called for a single national standard for frontier model transparency rather than a state-by-state patchwork, and encouraged continued investment in NIST's CAISI for evaluating frontier models on national security risks including CBRN capabilities.

Frontier Model Releases AI Safety Research Claude Opus 4.6 Center for AI Standards and Innovation Office of Management and Budget +9 more

6Anthropic News·17d ago·source ↗

Anthropic commits to signing the EU General-Purpose AI Code of Practice

Anthropic announced its intention to sign the EU's General-Purpose AI Code of Practice, citing alignment with its existing Responsible Scaling Policy on transparency, safety, and accountability. The company frames the Code's mandatory Safety and Security Frameworks—including CBRN risk assessment—as complementary to its own internal standards. Anthropic also signals continued collaboration with the EU AI Office and third-party bodies like the Frontier Model Forum to keep standards adaptive as the technology evolves.

AI Safety Research Regulatory Developments EU AI Act EU General-Purpose AI Code of Practice Frontier Model Forum +3 more