Almanac
model

Anthropic Fable 5

modelactiveprovisionalanthropic-fable-5-108a53ca·1 events·first seen 11h ago

Aliases: Anthropic Fable 5

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.AI·11h ago·source ↗

Red-team study finds Anthropic Fable 5 and Opus 4.8 remain reliably breakable under automated jailbreak attacks

A preprint evaluates adversarial robustness of two Anthropic frontier models—Fable 5 and Opus 4.8—against four families of automated jailbreak attacks across 7,826 harmful intents. Using the HackAgent framework, the study generated hundreds of thousands of adversarial attempts and confirmed 1,620 harmful completions from Opus 4.8 and 702 from Fable 5 via a three-judge panel. Tree-of-attacks adaptive search achieved 11.5% intent-level success against Opus 4.8 and 6.1% against Fable 5, with static obfuscation nearly fully neutralized. The authors conclude that even the most hardened frontier models remain reliably breakable under sustained automated pressure, cautioning against reading aggregate resistance rates as reassurance.