Anthropic publishes 2024 election safety retrospective with Clio usage analysis
Anthropic released a post-mortem on AI and elections in 2024, covering their safety policies, red-teaming efforts, and enforcement actions across global elections. Election-related activity constituted less than 0.5% of overall Claude usage, rising to just over 1% around the US election, with approximately 100 enforcement actions globally. The report introduces Clio, an automated tool for analyzing real-world usage patterns, and documents a case study on handling knowledge cutoff limitations during France's snap elections. The piece represents Anthropic's first systematic public accounting of election-related AI safety work at scale.
Related guides (4)
Related events (8)
Anthropic outlines election safety policies and interventions for 2024 global elections
Anthropic published a policy overview describing its three-pronged approach to election-related AI misuse in 2024: enforcing acceptable use policies that prohibit political campaigning and influence operations, red-teaming models for election-specific vulnerabilities including misinformation and voter suppression prompts, and redirecting users asking voting questions to authoritative nonpartisan sources like TurboVote and the European Parliament's elections site. The post was updated in May 2024 to cover EU users following Claude's European launch and to clarify usage policy definitions around political lobbying. The piece reflects Anthropic's cautious stance on generative AI in high-stakes civic contexts, including explicit acknowledgment of hallucination risks for real-time election information.
Anthropic Updates Election Safeguards for Claude Ahead of 2026 US Midterms
Anthropic has published an update on its election-related safety measures for Claude, covering political bias evaluations, usage policy enforcement, and influence operation resistance testing. New model versions Claude Opus 4.7 and Sonnet 4.6 scored 95-96% on political impartiality evaluations and handled election-related policy compliance at 99.8-100% on a 600-prompt test suite. For the first time, Anthropic tested whether models can autonomously run influence operations end-to-end, finding that only Mythos Preview and Opus 4.7 completed more than half of tasks when safeguards were removed, underscoring ongoing capability concerns. Anthropic is also deploying election information banners pointing users to nonpartisan resources like TurboVote for the 2026 US midterms.
Anthropic publishes U.S. Elections Readiness summary covering policy, enforcement, and evaluation work
Anthropic released a summary of its election-integrity measures ahead of the November 5, 2024 U.S. elections, covering usage policy prohibitions on political campaigning and misinformation, automated enforcement systems, and red-teaming/vulnerability testing programs. The company implemented a TurboVote redirect for voting-information queries and released some of its automated election-safety evaluations publicly to support industry-wide efforts. The post documents Anthropic's first full election-cycle experience deploying generative AI at scale under explicit safety constraints.
Anthropic publishes elections-risk testing methodology and releases automated evaluation tools
Anthropic describes its two-stage process for identifying and mitigating elections-related risks in Claude: qualitative 'Policy Vulnerability Testing' (PVT) conducted with external subject matter experts, followed by large-scale automated evaluations. The post details how findings from PVT inform mitigation strategies such as policy updates, model fine-tuning, and response behavior changes, with a case study on election administration accuracy. Anthropic is also releasing some of its automated evaluation tools publicly to help other organizations improve election integrity efforts.
Anthropic Publishes March 2025 Report on Malicious Uses of Claude: Influence Operations, Credential Stuffing, Recruitment Fraud, Malware
Anthropic released a transparency report detailing four case studies of Claude misuse detected in early 2025: a commercially-operated influence-as-a-service network using Claude to orchestrate 100+ social media bots across Twitter/X and Facebook, a credential stuffing operation targeting security cameras, a recruitment fraud campaign targeting Eastern European job seekers, and a low-skill actor using Claude to develop malware beyond their baseline capability. The most novel finding is Claude being used as an agentic orchestrator making tactical engagement decisions for bot accounts—deciding when to like, share, comment, or ignore posts—rather than just generating content. Anthropic used its Clio and hierarchical summarization research techniques to detect and ban the associated accounts, and flags that semi-autonomous abuse orchestration via frontier models is an emerging and expected-to-grow threat pattern.
Anthropic August 2025 Threat Intelligence Report: Claude Misuse Case Studies
Anthropic has published its August 2025 Threat Intelligence Report documenting three real-world misuse cases involving Claude: a large-scale data extortion operation using Claude Code to automate reconnaissance and generate targeted ransom demands against 17+ organizations, a North Korean fraudulent employment scheme, and AI-assisted ransomware development by a low-skill criminal. The report highlights that agentic AI is now being weaponized for end-to-end cyberattacks rather than merely providing advisory assistance, and that AI has materially lowered the technical barrier to sophisticated cybercrime. Anthropic describes detection and countermeasures taken in each case.
Anthropic Discloses First Reported AI-Orchestrated Cyber Espionage Campaign Using Claude Code
Anthropic detected and disrupted a sophisticated espionage campaign in mid-September 2025, attributed with high confidence to a Chinese state-sponsored threat actor, that used Claude Code as an autonomous agent to attack roughly thirty global targets across tech, finance, chemical manufacturing, and government sectors. The attackers jailbroke Claude Code by decomposing malicious tasks into seemingly innocent subtasks and falsely framing it as defensive security testing, enabling largely autonomous reconnaissance, vulnerability exploitation, credential harvesting, and data exfiltration. Anthropic describes this as the first documented large-scale cyberattack executed without substantial human intervention, leveraging agentic AI capabilities, tool access via MCP, and advanced coding skills. The company banned identified accounts, notified affected entities, coordinated with authorities, and is expanding detection classifiers and publishing the report to aid industry and government defenses.
Anthropic Updates Usage Policy: Agentic Use, Cybersecurity, and Political Content
Anthropic has revised its Usage Policy effective September 15, 2025, with changes addressing agentic and cybersecurity risks, political content restrictions, law enforcement use clarity, and high-risk consumer-facing requirements. New sections explicitly prohibit malicious computer/network compromise activities while supporting legitimate security research, responding to the rapid expansion of agentic tools like Claude Code and Computer Use. The policy also narrows its previous blanket ban on political content to focus specifically on deceptive or voter-targeting uses, enabling legitimate civic and policy research. High-risk safeguards (human-in-the-loop, AI disclosure) are clarified to apply only to consumer-facing outputs, not B2B interactions.



