7Anthropic News·23h ago

Anthropic details Fable 5 cybersecurity safeguards and proposes AI jailbreak severity framework

Anthropic has re-deployed Claude Fable 5 globally and published detailed documentation of its cybersecurity safety classifiers, which categorize uses into prohibited, high-risk dual use, low-risk dual use, and benign tiers. The post also introduces an early-draft jailbreak severity framework developed with Glasswing partners, intended to give AI developers and governments a shared vocabulary for describing jailbreak risk levels. Anthropic is soliciting public feedback on the framework and has launched a HackerOne bug bounty program for cyber jailbreaks in Fable 5. The disclosure is notable for its specificity about classifier design trade-offs, including the deliberate 'safety margin' that accepts higher false-positive rates to reduce harmful outputs.

Frontier Model Releases AI Safety Research Regulatory Developments HackerOne Claude Fable 5 Glasswing Anthropic

Related guides (4)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race to Build the World's Most Capable AI

Read asBeginner In-depth

Anthropic

Anthropic: Frontier AI Lab at the Intersection of Capability and Safety Governance

Read asIn-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Principles to Geopolitical Flashpoint

Read asIn-depth

Regulatory DevelopmentsTopic guide

AI Regulatory Developments: From Voluntary Frameworks to Government Enforcement

Read asBeginner In-depth

Related events (8)

9Anthropic News·2d ago·source ↗

Anthropic redeploys Claude Fable 5 after US export controls lifted; details safeguard framework and government collaboration

Anthropic is restoring global access to Claude Fable 5 starting July 1, 2026, after US export controls imposed on June 12 were lifted on June 30. The controls were triggered by an Amazon research report showing a jailbreak that allowed Fable 5 to identify software vulnerabilities and produce exploit code, though Anthropic's own testing confirmed comparable models (including Claude Opus 4.8, GPT-5.5, and Kimi K2.7) could produce the same outputs. Anthropic has deployed an improved safety classifier blocking the reported technique in over 99% of cases, and is co-developing a shared industry jailbreak severity framework with Amazon, Microsoft, Google, and other Glasswing partners. Access to the higher-capability Claude Mythos 5 remains restricted to approved US organizations under the Glasswing program.

Frontier Model Releases AI Safety Research claude.ai Kimi K2 Claude Mythos +11 more

9The Batch·Jun 12, 2026·source ↗

Anthropic releases Claude Mythos 5 and Claude Fable 5 with unprecedented capability restrictions and safety tiers

Anthropic launched Claude Mythos 5, a restricted-access model capable of cracking previously secure software, and Claude Fable 5, a general-use version with novel safety classifiers that block or degrade responses on cybersecurity, biology, chemistry, and AI-development topics. Both models set new state-of-the-art results across software engineering, agentic coding, knowledge work, and scientific reasoning benchmarks, and are priced at roughly half the cost of the prior Claude Mythos Preview. Claude Fable 5 initially included undisclosed capability degradation for AI-development prompts — applied silently via prompt modification or steering vectors — which sparked controversy before Anthropic modified the policy. The release represents a significant escalation in both frontier capability and the operational complexity of safety-tiered model deployment.

Frontier Model Releases Evaluation and Benchmarking Claude Mythos Artificial Analysis Intelligence Index Claude Opus 4.6 +9 more

9Anthropic News·Jun 13, 2026·source ↗

US government orders Anthropic to suspend access to Fable 5 and Mythos 5 citing national security jailbreak concerns

The US government issued an export control directive requiring Anthropic to immediately disable Fable 5 and Mythos 5 for all foreign nationals, effectively forcing a full customer suspension to ensure compliance. The government cited awareness of a jailbreak method, but Anthropic disputes the severity, stating the demonstrated technique is a narrow, non-universal jailbreak that produces results already achievable by other publicly available models including GPT-5.5. Anthropic is complying with the directive while publicly disagreeing with the standard applied, arguing that requiring perfect jailbreak resistance would halt all frontier model deployments industry-wide. This is a significant regulatory and safety governance flashpoint involving government authority over commercial AI model access.

Frontier Model Releases AI Safety Research Fable 5 UK AI Security Institute Mythos +5 more

7arXiv · cs.AI·Jun 17, 2026·source ↗

Red-team study finds Anthropic Fable 5 and Opus 4.8 remain reliably breakable under automated jailbreak attacks

A preprint evaluates adversarial robustness of two Anthropic frontier models—Fable 5 and Opus 4.8—against four families of automated jailbreak attacks across 7,826 harmful intents. Using the HackAgent framework, the study generated hundreds of thousands of adversarial attempts and confirmed 1,620 harmful completions from Opus 4.8 and 702 from Fable 5 via a three-judge panel. Tree-of-attacks adaptive search achieved 11.5% intent-level success against Opus 4.8 and 6.1% against Fable 5, with static obfuscation nearly fully neutralized. The authors conclude that even the most hardened frontier models remain reliably breakable under sustained automated pressure, cautioning against reading aggregate resistance rates as reassurance.

Frontier Model Releases Evaluation and Benchmarking tree-of-attacks Anthropic Fable 5 Claude Opus 4.8 +3 more

6Anthropic News·Jun 2, 2026·source ↗

Anthropic launches bug bounty program to stress-test ASL-3 Constitutional Classifiers

Anthropic launched an invite-only bug bounty program in partnership with HackerOne to find universal jailbreaks in its Constitutional Classifiers system before public deployment, offering up to $25,000 per verified vulnerability. The program targets CBRN-related safety bypasses on Claude 3.7 Sonnet and is part of Anthropic's work to meet its AI Safety Level-3 (ASL-3) Deployment Standard under its Responsible Scaling Policy. A follow-up update extended the program to test Constitutional Classifiers on the new Claude Opus 4 model and began accepting reports of universal jailbreaks found on public platforms. The initiative reflects Anthropic's structured approach to pre-deployment safety validation for increasingly capable models.

Frontier Model Releases AI Safety Research Constitutional Classifiers Claude Opus 4.6 HackerOne +3 more

8The Batch·Jun 19, 2026·source ↗

Andrew Ng commentary on Anthropic's Claude Fable 5 restrictions and U.S. export controls on frontier AI models

Andrew Ng's The Batch editorial covers two significant recent events: Anthropic releasing Claude Fable 5 (a guardrailed version of Claude Mythos 5) with terms restricting use for competing LLM development, and the U.S. Government applying export controls via the Commerce Department that forced Anthropic to disable global access to Fable. Ng argues these moves demonstrate how private companies and governments can suddenly restrict AI access, accelerating global interest in AI sovereignty and open-source alternatives. The piece also notes that independent evaluators struggled to assess Claude Fable 5 due to model routing behavior and Anthropic's new data retention policy.

Frontier Model Releases Open Weights Progress DeepLearning.AI Claude Mythos Claude Opus 4.6 +9 more

8Hacker News·Jun 9, 2026·source ↗

Anthropic releases Claude Fable 5

Anthropic has released Claude Fable 5, a new model in the Claude family, announced via their official news channel. The Hacker News discussion generated substantial engagement with 1,468 points and 1,156 comments, indicating significant community interest. No detailed capability claims or benchmark results are available from this item alone.

Frontier Model Releases AI Safety Research Claude Mythos IMC Claude Opus 4.6 +11 more

6Don'T Worry About The Vase·Jun 10, 2026·source ↗

Zvi Mowshowitz analyzes Claude Fable 5 release and lab safety plans

Zvi Mowshowitz's commentary covers the release of Claude Fable 5, described as the distributable version of Claude Mythos that Anthropic considers safe for public deployment. The piece appears to analyze safety-related plans from multiple AI labs alongside a memorandum. The item is notable as a tier-2 commentary on what appears to be a significant Anthropic model release.

Frontier Model Releases AI Safety Research Claude Mythos Claude Fable 5 Zvi Mowshowitz +1 more