7Anthropic News·18d ago

Anthropic publishes framework for safe and trustworthy agent development

Anthropic released a formal framework for responsible agent development, articulating principles around human oversight, transparency, value alignment, and privacy for autonomous AI agents. The document draws on Claude Code as a reference implementation and cites enterprise deployments at Trellix and Block as real-world examples. The framework is positioned as a contribution to emerging industry standards for agentic AI systems, acknowledging open technical challenges in value alignment measurement and oversight calibration.

AI Safety Research Regulatory Developments Agent and Tool Ecosystem Block Claude Code Trellix Anthropic

Related guides (4)

Claude Code

Claude Code: Anthropic's Autonomous Coding Agent

Read asBeginner In-depthfeatured

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Related events (8)

5Anthropic News·18d ago·source ↗

Anthropic publishes structured harm assessment framework covering physical, psychological, economic, and societal impacts

Anthropic has released a policy document describing their evolving framework for assessing and mitigating AI harms across five dimensions: physical, psychological, economic, societal, and individual autonomy impacts. The framework complements their existing Responsible Scaling Policy and informs decisions on usage policies, red-teaming, detection, and enforcement. Concrete examples include safeguards for computer use capabilities (fraud, phishing) and a reported 45% reduction in unnecessary refusals in Claude 3.7 Sonnet through improved handling of ambiguous prompts. Anthropic frames this as a work-in-progress and invites collaboration from the broader AI ecosystem.

AI Safety Research Alignment and RLHF Responsible Scaling Policy Claude 3.7 Sonnet Anthropic

6Anthropic News·17d ago·source ↗

Anthropic commits to signing the EU General-Purpose AI Code of Practice

Anthropic announced its intention to sign the EU's General-Purpose AI Code of Practice, citing alignment with its existing Responsible Scaling Policy on transparency, safety, and accountability. The company frames the Code's mandatory Safety and Security Frameworks—including CBRN risk assessment—as complementary to its own internal standards. Anthropic also signals continued collaboration with the EU AI Office and third-party bodies like the Frontier Model Forum to keep standards adaptive as the technology evolves.

AI Safety Research Regulatory Developments EU AI Act EU General-Purpose AI Code of Practice Frontier Model Forum +3 more

6Anthropic News·18d ago·source ↗

Anthropic publishes foundational 'Core Views on AI Safety' position paper

Anthropic released a detailed position paper outlining their core views on AI safety, arguing that transformative AI could arrive within a decade driven by predictable scaling laws, and that no one currently knows how to train powerful AI systems to robustly behave well. The document explains Anthropic's founding rationale and research strategy, highlighting four priority areas: scaling supervision, mechanistic interpretability, process-oriented learning, and understanding AI generalization. Originally published March 2023, this represents Anthropic's canonical public statement of their safety philosophy and strategic priorities.

AI Safety Research Alignment and RLHF GPT-3 mechanistic interpretability Anthropic

7Anthropic News·16d ago·source ↗

Anthropic launches initiative to fund third-party AI safety evaluations

Anthropic announced a funded initiative to source third-party evaluations measuring advanced AI capabilities and safety risks, with priority areas including cybersecurity, CBRN threats, model autonomy, national security risks, social manipulation, and misalignment. The initiative is tied to Anthropic's Responsible Scaling Policy and AI Safety Level (ASL) framework, aiming to address a gap between demand and supply of high-quality safety-relevant evals. Proposals are solicited via an application form, with Anthropic framing the effort as benefiting the broader AI safety ecosystem rather than just internal use.

Evaluation and Benchmarking AI Safety Research METR Google-Proof Q&A Responsible Scaling Policy +1 more

5Anthropic News·16d ago·source ↗

Anthropic submits AI accountability recommendations to NTIA, covering evals, red teaming, and pre-registration

Anthropic submitted a formal response to the NTIA's Request for Comment on AI Accountability, outlining a multi-part policy framework for governing advanced AI systems. Key recommendations include increased government funding for evaluation research, mandatory disclosure of evaluation methods, pre-registration of large training runs with national governments, mandated external red teaming before model release, and antitrust guidance to enable industry safety collaboration. The submission reflects Anthropic's core policy positions and advocates for risk-tiered oversight proportional to model capabilities.

Evaluation and Benchmarking AI Safety Research National Institute of Standards and Technology National Telecommunications and Information Administration Anthropic +1 more

8Anthropic News·17d ago·source ↗

Anthropic publishes Responsible Scaling Policy with AI Safety Level framework

Anthropic released its Responsible Scaling Policy (RSP), a formal framework of technical and organizational protocols for managing catastrophic risks from increasingly capable AI systems. The policy introduces AI Safety Levels (ASL-1 through ASL-5+), modeled on US biosafety level standards, requiring progressively stricter safety, security, and operational standards as models become more capable. Current Claude models are classified as ASL-2; ASL-3 triggers stricter deployment constraints including adversarial red-teaming requirements. The policy has been approved by Anthropic's board and is intended as a template for industry-wide adoption.

Frontier Model Releases AI Safety Research ARC Evals Claude Responsible Scaling Policy +3 more

5Anthropic News·17d ago·source ↗

Anthropic publishes frontier model security recommendations including multi-party authorization and secure development frameworks

Anthropic released a policy and technical guidance document outlining cybersecurity best practices for securing frontier AI models, including multi-party authorization to AI-critical infrastructure, adoption of NIST SSDF and SLSA supply chain standards, and public-private cooperation modeled on critical infrastructure sectors. The post argues that advanced AI models warrant security levels far exceeding standard commercial practices and recommends government procurement requirements as a near-term enforcement mechanism. Anthropic states it is actively implementing these controls internally and calls on other labs and governments to adopt similar frameworks.

AI Safety Research Regulatory Developments Supply Chain Levels for Software Artifacts National Institute of Standards and Technology NIST Secure Software Development Framework +1 more

5Anthropic News·17d ago·source ↗

Anthropic achieves ISO/IEC 42001:2023 certification for AI management systems

Anthropic has received accredited certification under ISO/IEC 42001:2023, the first international standard for AI governance and management systems, issued by Schellman Compliance LLC. The certification covers Anthropic's policies, testing, monitoring, transparency measures, and oversight structures for responsible AI development. Anthropic claims to be among the first frontier AI labs to achieve this certification, positioning it as external validation of their safety commitments alongside existing frameworks like their Responsible Scaling Policy and Constitutional AI.

AI Safety Research Regulatory Developments Schellman Compliance LLC Constitutional AI ANSI National Accreditation Board +3 more