Entity · technique

AI Safety Level (ASL)

techniqueactiveai-safety-level-asl--02891777·4 events·first seen Jun 1, 2026

Aliases: AI Safety Level (ASL), AI Safety Level 2 (ASL-2), AI Safety Level 3 (ASL-3), AI Safety Level 2, AI Safety Levels

Co-occurring entities

More like this (12)

AI Safety Level Standards ASL-2 ASL-3 Harmonizing AI Safety Thresholds AASIST AI Safety Fund UK Artificial Intelligence Safety Institute Australia AI Safety Institute AI Liability Directive Japan AI Safety Institute RAS: Measuring LLM Safety Through Refusal Alignment AI biosecurity risk assessment

Recent events (4)

7Anthropic News·Jun 4, 2026·source ↗

Dario Amodei's AI Safety Summit remarks detail Anthropic's Responsible Scaling Policy and ASL framework

Dario Amodei delivered prepared remarks at the UK AI Safety Summit (November 2023) explaining Anthropic's Responsible Scaling Policy (RSP), which was the first such policy published by a major AI lab. The RSP introduces AI Safety Levels (ASL-1 through ASL-4), modeled on biosafety level frameworks, with capability thresholds triggering mandatory safeguards before further training or deployment. Key implementation lessons include deep executive involvement, integrating RSP requirements into product roadmaps, and formal accountability through Anthropic's board and Long Term Benefit Trust. The remarks outline specific ASL-3 requirements around CBRN misuse prevention and security, and preview ASL-4 criteria involving near-human autonomy or becoming a primary source of global security threats.

Frontier Model Releases AI Safety Research Dario Amodei UK AI Safety Summit Alignment Research Center +5 more

8Anthropic News·Jun 2, 2026·source ↗

Anthropic Releases Computer Use Capability for Claude 3.5 Sonnet

Anthropic has launched a public beta of computer use for Claude 3.5 Sonnet, enabling the model to control a computer by interpreting screenshots and issuing pixel-level cursor and keyboard commands. The model achieves 14.9% on the OSWorld benchmark, roughly double the next-best AI model's 7.7%, though well below human-level performance of 70-75%. Anthropic trained the model on a small set of simple software tools and found it generalized rapidly to broader computer interaction. Safety analysis confirmed the capability remains at AI Safety Level 2, with prompt injection identified as a primary near-term risk.

Evaluation and Benchmarking AI Safety Research prompt injection Claude 3.5 Sonnet Responsible Scaling Policy +6 more

7Anthropic News·Jun 1, 2026·source ↗

Anthropic Launches Claude Haiku 4.5: Near-Frontier Performance at $1/$5 per Million Tokens

Anthropic has released Claude Haiku 4.5, a small model priced at $1/$5 per million input/output tokens that delivers coding performance comparable to Claude Sonnet 4 at one-third the cost and more than twice the speed. The model surpasses Sonnet 4 on computer use tasks and achieves 90% of Sonnet 4.5's performance on agentic coding evaluations, running 4-5x faster than Sonnet 4.5. Notably, Haiku 4.5 is classified under ASL-2 safety standards—less restrictive than the ASL-3 applied to Sonnet 4.5 and Opus 4.1—and is described as Anthropic's safest model by automated alignment metrics. It is available via the Claude API, Amazon Bedrock, and Google Cloud Vertex AI.

Frontier Model Releases Evaluation and Benchmarking Claude Sonnet 4 Amazon Bedrock Claude Opus 4.6 +15 more

8Anthropic News·Jun 1, 2026·source ↗

Anthropic Releases Responsible Scaling Policy Version 3.0

Anthropic has published the third version of its Responsible Scaling Policy (RSP), a voluntary framework for mitigating catastrophic risks from increasingly capable AI systems. The update reflects two-plus years of experience with the original RSP, reinforcing what worked (ASL-3 safeguards activated in May 2025, industry adoption by OpenAI and Google DeepMind, informing early AI policy) while addressing shortcomings in accountability and transparency. The new version refines the AI Safety Level (ASL) framework and introduces new measures for decision-making transparency. Anthropic acknowledges that some elements of its original theory of change—particularly multilateral coordination and government action at higher capability thresholds—have not fully materialized as hoped.

Frontier Model Releases Evaluation and Benchmarking RAISE Act EU AI Act California SB 53 +8 more