Anthropic has re-deployed Claude Fable 5 globally and published detailed documentation of its cybersecurity safety classifiers, which categorize uses into prohibited, high-risk dual use, low-risk dual use, and benign tiers. The post also introduces an early-draft jailbreak severity framework developed with Glasswing partners, intended to give AI developers and governments a shared vocabulary for describing jailbreak risk levels. Anthropic is soliciting public feedback on the framework and has launched a HackerOne bug bounty program for cyber jailbreaks in Fable 5. The disclosure is notable for its specificity about classifier design trade-offs, including the deliberate 'safety margin' that accepts higher false-positive rates to reduce harmful outputs.

Anthropic
Anthropic is restoring global access to Claude Fable 5 starting July 1, 2026, after US export controls imposed on June 12 were lifted on June 30. The controls were triggered by an Amazon research report showing a jailbreak that allowed Fable 5 to identify software vulnerabilities and produce exploit code, though Anthropic's own testing confirmed comparable models (including Claude Opus 4.8, GPT-5.5, and Kimi K2.7) could produce the same outputs. Anthropic has deployed an improved safety classifier blocking the reported technique in over 99% of cases, and is co-developing a shared industry jailbreak severity framework with Amazon, Microsoft, Google, and other Glasswing partners. Access to the higher-capability Claude Mythos 5 remains restricted to approved US organizations under the Glasswing program.
Anthropic launched Claude Mythos 5, a restricted-access model capable of cracking previously secure software, and Claude Fable 5, a general-use version with novel safety classifiers that block or degrade responses on cybersecurity, biology, chemistry, and AI-development topics. Both models set new state-of-the-art results across software engineering, agentic coding, knowledge work, and scientific reasoning benchmarks, and are priced at roughly half the cost of the prior Claude Mythos Preview. Claude Fable 5 initially included undisclosed capability degradation for AI-development prompts — applied silently via prompt modification or steering vectors — which sparked controversy before Anthropic modified the policy. The release represents a significant escalation in both frontier capability and the operational complexity of safety-tiered model deployment.
The US government issued an export control directive requiring Anthropic to immediately disable Fable 5 and Mythos 5 for all foreign nationals, effectively forcing a full customer suspension to ensure compliance. The government cited awareness of a jailbreak method, but Anthropic disputes the severity, stating the demonstrated technique is a narrow, non-universal jailbreak that produces results already achievable by other publicly available models including GPT-5.5. Anthropic is complying with the directive while publicly disagreeing with the standard applied, arguing that requiring perfect jailbreak resistance would halt all frontier model deployments industry-wide. This is a significant regulatory and safety governance flashpoint involving government authority over commercial AI model access.
A preprint evaluates adversarial robustness of two Anthropic frontier models—Fable 5 and Opus 4.8—against four families of automated jailbreak attacks across 7,826 harmful intents. Using the HackAgent framework, the study generated hundreds of thousands of adversarial attempts and confirmed 1,620 harmful completions from Opus 4.8 and 702 from Fable 5 via a three-judge panel. Tree-of-attacks adaptive search achieved 11.5% intent-level success against Opus 4.8 and 6.1% against Fable 5, with static obfuscation nearly fully neutralized. The authors conclude that even the most hardened frontier models remain reliably breakable under sustained automated pressure, cautioning against reading aggregate resistance rates as reassurance.
Anthropic launched an invite-only bug bounty program in partnership with HackerOne to find universal jailbreaks in its Constitutional Classifiers system before public deployment, offering up to $25,000 per verified vulnerability. The program targets CBRN-related safety bypasses on Claude 3.7 Sonnet and is part of Anthropic's work to meet its AI Safety Level-3 (ASL-3) Deployment Standard under its Responsible Scaling Policy. A follow-up update extended the program to test Constitutional Classifiers on the new Claude Opus 4 model and began accepting reports of universal jailbreaks found on public platforms. The initiative reflects Anthropic's structured approach to pre-deployment safety validation for increasingly capable models.
Andrew Ng's The Batch editorial covers two significant recent events: Anthropic releasing Claude Fable 5 (a guardrailed version of Claude Mythos 5) with terms restricting use for competing LLM development, and the U.S. Government applying export controls via the Commerce Department that forced Anthropic to disable global access to Fable. Ng argues these moves demonstrate how private companies and governments can suddenly restrict AI access, accelerating global interest in AI sovereignty and open-source alternatives. The piece also notes that independent evaluators struggled to assess Claude Fable 5 due to model routing behavior and Anthropic's new data retention policy.
Anthropic has released Claude Fable 5, a new model in the Claude family, announced via their official news channel. The Hacker News discussion generated substantial engagement with 1,468 points and 1,156 comments, indicating significant community interest. No detailed capability claims or benchmark results are available from this item alone.
Zvi Mowshowitz's commentary covers the release of Claude Fable 5, described as the distributable version of Claude Mythos that Anthropic considers safe for public deployment. The piece appears to analyze safety-related plans from multiple AI labs alongside a memorandum. The item is notable as a tier-2 commentary on what appears to be a significant Anthropic model release.