Anthropic and NNSA Co-Develop Nuclear Safeguards Classifier for Claude Traffic
Anthropic, in partnership with the U.S. Department of Energy's National Nuclear Security Administration (NNSA) and DOE national laboratories, has co-developed an AI classifier that distinguishes between concerning and benign nuclear-related conversations with 96% accuracy in preliminary testing. The classifier has already been deployed on live Claude traffic as part of Anthropic's misuse-detection infrastructure. Anthropic plans to share the approach with the Frontier Model Forum as a replicable blueprint for other AI developers. This represents the first public-private partnership of this kind for nuclear proliferation risk monitoring in frontier AI systems.
Related guides (4)
Related events (8)
Anthropic Details Collaboration with US CAISI and UK AISI on Constitutional Classifier Red-Teaming
Anthropic has published an account of its ongoing voluntary partnership with the US Center for AI Standards and Innovation (CAISI) and UK AI Security Institute (AISI), in which government red-teamers were given deep access to pre-deployment versions of Constitutional Classifiers used on Claude Opus 4 and 4.1. The collaboration uncovered multiple vulnerability classes including prompt injection bypasses, cipher-based obfuscation attacks, universal jailbreaks via automated attack refinement, and input/output fragmentation exploits, each of which drove architectural improvements to Anthropic's safeguard systems. Key lessons shared include the value of providing unprotected model variants, real-time classifier score access, and detailed internal documentation to enable targeted red-teaming. The announcement frames government partnership as a core component of Anthropic's Safeguards approach rather than a one-off audit.
Anthropic partners with U.S. National Labs for 1,000 Scientist AI Jam evaluating Claude on scientific tasks
Anthropic is participating in the U.S. Department of Energy's first 1,000 Scientist AI Jam, bringing together scientists across multiple National Laboratories to evaluate frontier AI models on scientific research and national security applications. Claude 3.7 Sonnet, recently launched as the first hybrid reasoning model, will be a primary subject of evaluation across tasks including hypothesis generation, experiment planning, code generation, and result analysis. This builds on Anthropic's April 2024 collaboration with the National Nuclear Security Administration, which was the first instance of a frontier lab evaluating a model in a Top Secret classified environment. The partnership signals deepening government-industry collaboration on AI for scientific discovery and national security.
Anthropic Details Claude Safeguards Team Structure and Multi-Layer Safety Approach
Anthropic has published a detailed overview of its internal Safeguards team, describing a multi-layer approach to preventing Claude misuse that spans policy development, model training influence, pre-deployment evaluation, and real-time enforcement. The team uses a Unified Harm Framework covering five dimensions (physical, psychological, economic, societal, autonomy) and conducts Policy Vulnerability Testing with external domain experts in areas like terrorism, child safety, and mental health. Pre-deployment evaluations include safety assessments, CBRNE-focused AI capability uplift testing with government partners, and bias evaluations. The post describes specific partnerships with organizations like the Institute for Strategic Dialogue and ThroughLine to inform election integrity and mental health response policies.
Anthropic launches Claude Gov models for U.S. national security and classified environments
Anthropic has introduced Claude Gov, a custom set of Claude models built exclusively for U.S. national security customers operating in classified environments. The models are already deployed at the highest classification levels and feature reduced refusals on classified materials, enhanced intelligence and defense domain understanding, improved foreign language proficiency for national security use cases, and stronger cybersecurity data interpretation. The release marks a significant expansion of Anthropic's government AI strategy into sensitive national security applications.
Anthropic Partners with US Department of Energy on Genesis Mission for AI-Driven Scientific Discovery
Anthropic and the US Department of Energy have announced a multi-year partnership under the DOE's Genesis Mission initiative, targeting AI deployment across energy, biological sciences, and scientific productivity domains. The partnership will provide DOE researchers access to Claude and Anthropic engineers who will build purpose-built agents, Model Context Protocol servers, and specialized Claude Skills for scientific workflows. The collaboration has potential reach across all 17 US national laboratories and builds on prior work including a nuclear risk classifier with the National Nuclear Security Administration and Claude deployment at Lawrence Livermore. This represents a significant expansion of Anthropic's US government footprint.
Anthropic Frontier Red Team reports early-warning signs of rapid AI progress in cybersecurity and biosecurity capabilities
Anthropic's Frontier Red Team published findings from a year of safety evaluations across four model releases, documenting rapid capability gains in dual-use domains. In cybersecurity, Claude 3.7 Sonnet now solves roughly a third of Cybench CTF challenges (up from ~5% a year ago), and with the Incalmo toolset was able to replicate a large-scale network attack in realistic cyber range environments. In biosecurity, Claude has moved from underperforming virology experts to exceeding them on the VCT benchmark within one year, and exceeds human expert baselines on cloning workflows. Anthropic assesses current models as showing 'early warning' signs but not yet crossing thresholds of substantially elevated national security risk.
Anthropic makes Claude 3 Haiku and Sonnet available to US Intelligence Community and AWS GovCloud
Anthropic has made Claude 3 Haiku and Claude 3 Sonnet available via AWS Marketplace for the US Intelligence Community and AWS GovCloud, marking a significant expansion into government deployment. The company has crafted contractual exceptions to its general Usage Policy to permit legally authorized foreign intelligence analysis, including combating human trafficking and identifying covert influence campaigns, while maintaining restrictions on disinformation, weapons design, and malicious cyber operations. The deployment is currently limited to ASL-2 models under Anthropic's Responsible Scaling Policy. Anthropic also notes prior pre-release access to Claude 3.5 Sonnet was provided to the UK AI Safety Institute for pre-deployment testing.
Anthropic activates ASL-3 safety protections for Claude Opus 4 launch
Anthropic has activated its AI Safety Level 3 (ASL-3) Deployment and Security Standards in conjunction with launching Claude Opus 4, marking the first time any Anthropic model has been deployed under ASL-3 rather than the baseline ASL-2. The activation is described as precautionary: Anthropic has not conclusively determined that Opus 4 crosses the ASL-3 capability threshold, but cannot rule it out due to continued improvements in CBRN-related knowledge. ASL-3 measures include Constitutional Classifiers to block end-to-end CBRN weapon development workflows and enhanced model-weight security against sophisticated non-state attackers. Claude Sonnet 4 was evaluated and cleared for ASL-2, and ASL-4 was ruled out for Opus 4.



