AI Safety Level (ASL)
ai-safety-level-asl--02891777·4 events·first seen 15d agoAliases: AI Safety Level (ASL), AI Safety Level 2 (ASL-2), AI Safety Level 3 (ASL-3), AI Safety Level 2, AI Safety Levels
Co-occurring entities
More like this (12)
Recent events (4)
Anthropic Releases Responsible Scaling Policy Version 3.0
Anthropic has published the third version of its Responsible Scaling Policy (RSP), a voluntary framework for mitigating catastrophic risks from increasingly capable AI systems. The update reflects two-plus years of experience with the original RSP, reinforcing what worked (ASL-3 safeguards activated in May 2025, industry adoption by OpenAI and Google DeepMind, informing early AI policy) while addressing shortcomings in accountability and transparency. The new version refines the AI Safety Level (ASL) framework and introduces new measures for decision-making transparency. Anthropic acknowledges that some elements of its original theory of change—particularly multilateral coordination and government action at higher capability thresholds—have not fully materialized as hoped.
Dario Amodei's AI Safety Summit remarks detail Anthropic's Responsible Scaling Policy and ASL framework
Dario Amodei delivered prepared remarks at the UK AI Safety Summit (November 2023) explaining Anthropic's Responsible Scaling Policy (RSP), which was the first such policy published by a major AI lab. The RSP introduces AI Safety Levels (ASL-1 through ASL-4), modeled on biosafety level frameworks, with capability thresholds triggering mandatory safeguards before further training or deployment. Key implementation lessons include deep executive involvement, integrating RSP requirements into product roadmaps, and formal accountability through Anthropic's board and Long Term Benefit Trust. The remarks outline specific ASL-3 requirements around CBRN misuse prevention and security, and preview ASL-4 criteria involving near-human autonomy or becoming a primary source of global security threats.
Anthropic Launches Claude Haiku 4.5: Near-Frontier Performance at $1/$5 per Million Tokens
Anthropic has released Claude Haiku 4.5, a small model priced at $1/$5 per million input/output tokens that delivers coding performance comparable to Claude Sonnet 4 at one-third the cost and more than twice the speed. The model surpasses Sonnet 4 on computer use tasks and achieves 90% of Sonnet 4.5's performance on agentic coding evaluations, running 4-5x faster than Sonnet 4.5. Notably, Haiku 4.5 is classified under ASL-2 safety standards—less restrictive than the ASL-3 applied to Sonnet 4.5 and Opus 4.1—and is described as Anthropic's safest model by automated alignment metrics. It is available via the Claude API, Amazon Bedrock, and Google Cloud Vertex AI.
Anthropic Releases Computer Use Capability for Claude 3.5 Sonnet
Anthropic has launched a public beta of computer use for Claude 3.5 Sonnet, enabling the model to control a computer by interpreting screenshots and issuing pixel-level cursor and keyboard commands. The model achieves 14.9% on the OSWorld benchmark, roughly double the next-best AI model's 7.7%, though well below human-level performance of 70-75%. Anthropic trained the model on a small set of simple software tools and found it generalized rapidly to broader computer interaction. Safety analysis confirmed the capability remains at AI Safety Level 2, with prompt injection identified as a primary near-term risk.