glasswing-8ed78a4c·1 events·first seen Aliases: Glasswing
Anthropic has re-deployed Claude Fable 5 globally and published detailed documentation of its cybersecurity safety classifiers, which categorize uses into prohibited, high-risk dual use, low-risk dual use, and benign tiers. The post also introduces an early-draft jailbreak severity framework developed with Glasswing partners, intended to give AI developers and governments a shared vocabulary for describing jailbreak risk levels. Anthropic is soliciting public feedback on the framework and has launched a HackerOne bug bounty program for cyber jailbreaks in Fable 5. The disclosure is notable for its specificity about classifier design trade-offs, including the deliberate 'safety margin' that accepts higher false-positive rates to reduce harmful outputs.