Entity · technique

universal jailbreak

techniqueactiveuniversal-jailbreak-c488ca30·1 events·first seen Jun 2, 2026

Aliases: universal jailbreak

Co-occurring entities

Constitutional Classifiers prompt injection Claude Opus 4.6 Center for AI Standards and Innovation UK AI Security Institute Anthropic

More like this (12)

Jailbreak black-box jailbreaking JailbreakBench iOSWorld WWDC Hacktivate AI Xcode 26 execution sandboxing Xcode activation patching AppWorld unclecode

Recent events (1)

7Anthropic News·Jun 2, 2026·source ↗

Anthropic Details Collaboration with US CAISI and UK AISI on Constitutional Classifier Red-Teaming

Anthropic has published an account of its ongoing voluntary partnership with the US Center for AI Standards and Innovation (CAISI) and UK AI Security Institute (AISI), in which government red-teamers were given deep access to pre-deployment versions of Constitutional Classifiers used on Claude Opus 4 and 4.1. The collaboration uncovered multiple vulnerability classes including prompt injection bypasses, cipher-based obfuscation attacks, universal jailbreaks via automated attack refinement, and input/output fragmentation exploits, each of which drove architectural improvements to Anthropic's safeguard systems. Key lessons shared include the value of providing unprotected model variants, real-time classifier score access, and detailed internal documentation to enable targeted red-teaming. The announcement frames government partnership as a core component of Anthropic's Safeguards approach rather than a one-off audit.

Frontier Model Releases Evaluation and Benchmarking Constitutional Classifiers prompt injection Claude Opus 4.6 +6 more