Entity · organization

Anthropic Policy Frontier Red Team

organizationactiveanthropic-policy-frontier-red-team-f97ff38a·5 events·first seen Jun 1, 2026

Aliases: Anthropic Policy Frontier Red Team, Anthropic Frontier Red Team

Co-occurring entities

More like this (12)

Frontier Red Team Anthropic Safeguards Team Anthropic Usage Policy Anthropic Responsible Scaling Policy Anthropic Economic Policy Framework Anthropic Advanced AI Framework Anthropic for Startups Anthropic Academy Anthropic Public Record Anthropic Beneficial Deployments Anthropic Economic Futures Research Fund OpenAI Frontier

Recent events (5)

6Anthropic News·Jun 4, 2026·source ↗

Anthropic publishes policy brief calling for targeted AI regulation within 18 months

Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.

AI Safety Research Regulatory Developments Anthropic Policy Frontier Red Team Claude 3.5 Sonnet UK AI Security Institute +5 more

8Anthropic News·Jun 3, 2026·source ↗

Anthropic maps 832 AI-enabled cyberattacks, finds MITRE ATT&CK framework inadequate for agentic threats

Anthropic's Frontier Red Team analyzed 832 accounts banned for malicious cyber activity between March 2025 and March 2026, mapping their techniques against the MITRE ATT&CK framework. Key findings: medium-or-higher-risk actors grew from 33% to 56% across the study period; AI use is shifting from initial-access techniques toward post-compromise operations like lateral movement and privilege escalation; and traditional risk signals (technique count, platform used) no longer reliably distinguish threat levels. The report concludes that MITRE ATT&CK lacks coverage for agentic orchestration behaviors—where AI chains attack stages autonomously with minimal human input—which characterize the highest-risk actors, including a state-sponsored espionage operation disrupted in November 2025.

AI Safety Research Regulatory Developments Anthropic Policy Frontier Red Team MITRE ATT&CK Claude Code +3 more

7Anthropic News·Jun 2, 2026·source ↗

Anthropic and NNSA Co-Develop Nuclear Safeguards Classifier for Claude Traffic

Anthropic, in partnership with the U.S. Department of Energy's National Nuclear Security Administration (NNSA) and DOE national laboratories, has co-developed an AI classifier that distinguishes between concerning and benign nuclear-related conversations with 96% accuracy in preliminary testing. The classifier has already been deployed on live Claude traffic as part of Anthropic's misuse-detection infrastructure. Anthropic plans to share the approach with the Frontier Model Forum as a replicable blueprint for other AI developers. This represents the first public-private partnership of this kind for nuclear proliferation risk monitoring in frontier AI systems.

Evaluation and Benchmarking AI Safety Research Nuclear Proliferation Risk Classifier Claude Anthropic Policy Frontier Red Team +5 more

7Anthropic News·Jun 1, 2026·source ↗

Anthropic Launches Claude Code Security: AI-Powered Vulnerability Detection for Defenders

Anthropic has released Claude Code Security in limited research preview for Enterprise and Team customers, a capability built into Claude Code that scans codebases for security vulnerabilities and suggests patches for human review. Unlike rule-based static analysis tools, it uses Claude's reasoning to understand code context, trace data flows, and detect complex vulnerabilities including novel ones. Built on Claude Opus 4.6, the system found over 500 previously undetected vulnerabilities in production open-source codebases during internal research. The release is framed as a defensive measure to put AI-enabled vulnerability discovery in the hands of defenders before attackers can exploit the same capabilities.

Frontier Model Releases AI Safety Research Claude Opus 4.6 Anthropic Policy Frontier Red Team Pacific Northwest National Laboratory +5 more

8Anthropic News·Jun 1, 2026·source ↗

Claude Opus 4.6 Discovers 22 Firefox Vulnerabilities in Two-Week Mozilla Partnership

Anthropic's Claude Opus 4.6 identified 22 vulnerabilities in Firefox over two weeks in February 2026, of which Mozilla classified 14 as high-severity—representing nearly a fifth of all high-severity Firefox vulnerabilities remediated in 2025. The collaboration grew from internal evaluations showing Opus 4.5 was near-saturating CyberGym, a benchmark for LLM security capability, prompting Anthropic to test against a harder real-world target. Claude scanned nearly 6,000 C++ files and submitted 112 unique reports, with most issues patched in Firefox 148.0. The effort also included an evaluation of Claude's ability to write primitive exploits, probing the upper limits of AI-enabled offensive security capability.

Frontier Model Releases Evaluation and Benchmarking Firefox Claude Opus 4.6 Mozilla +8 more