universal jailbreak
universal-jailbreak-c488ca30·1 events·first seen 15d agoAliases: universal jailbreak
Co-occurring entities
More like this (12)
Recent events (1)
Anthropic Details Collaboration with US CAISI and UK AISI on Constitutional Classifier Red-Teaming
Anthropic has published an account of its ongoing voluntary partnership with the US Center for AI Standards and Innovation (CAISI) and UK AI Security Institute (AISI), in which government red-teamers were given deep access to pre-deployment versions of Constitutional Classifiers used on Claude Opus 4 and 4.1. The collaboration uncovered multiple vulnerability classes including prompt injection bypasses, cipher-based obfuscation attacks, universal jailbreaks via automated attack refinement, and input/output fragmentation exploits, each of which drove architectural improvements to Anthropic's safeguard systems. Key lessons shared include the value of providing unprotected model variants, real-time classifier score access, and detailed internal documentation to enable targeted red-teaming. The announcement frames government partnership as a core component of Anthropic's Safeguards approach rather than a one-off audit.