5Anthropic News·18d ago

Anthropic publishes elections-risk testing methodology and releases automated evaluation tools

Anthropic describes its two-stage process for identifying and mitigating elections-related risks in Claude: qualitative 'Policy Vulnerability Testing' (PVT) conducted with external subject matter experts, followed by large-scale automated evaluations. The post details how findings from PVT inform mitigation strategies such as policy updates, model fine-tuning, and response behavior changes, with a case study on election administration accuracy. Anthropic is also releasing some of its automated evaluation tools publicly to help other organizations improve election integrity efforts.

Evaluation and Benchmarking AI Safety Research Regulatory Developments Isabelle Frances-Wright Claude Policy Vulnerability Testing Institute for Strategic Dialogue Anthropic

Related guides (4)

Claude

Claude: Anthropic's AI Assistant Built for Safety and Scale

Read asBeginner In-depth

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Regulatory DevelopmentsTopic guide

AI Regulatory Developments: From Voluntary Frameworks to Government Enforcement

Read asIn-depth

Related events (8)

5Anthropic News·17d ago·source ↗

Anthropic publishes U.S. Elections Readiness summary covering policy, enforcement, and evaluation work

Anthropic released a summary of its election-integrity measures ahead of the November 5, 2024 U.S. elections, covering usage policy prohibitions on political campaigning and misinformation, automated enforcement systems, and red-teaming/vulnerability testing programs. The company implemented a TurboVote redirect for voting-information queries and released some of its automated election-safety evaluations publicly to support industry-wide efforts. The post documents Anthropic's first full election-cycle experience deploying generative AI at scale under explicit safety constraints.

AI Safety Research Regulatory Developments Claude Democracy Works Amazon Web Services +3 more

6Anthropic News·1mo ago·source ↗

Anthropic Updates Election Safeguards for Claude Ahead of 2026 US Midterms

Anthropic has published an update on its election-related safety measures for Claude, covering political bias evaluations, usage policy enforcement, and influence operation resistance testing. New model versions Claude Opus 4.7 and Sonnet 4.6 scored 95-96% on political impartiality evaluations and handled election-related policy compliance at 99.8-100% on a 600-prompt test suite. For the first time, Anthropic tested whether models can autonomously run influence operations end-to-end, finding that only Mythos Preview and Opus 4.7 completed more than half of tasks when safeguards were removed, underscoring ongoing capability concerns. Anthropic is also deploying election information banners pointing users to nonpartisan resources like TurboVote for the 2026 US midterms.

Frontier Model Releases Evaluation and Benchmarking Collective Intelligence Project Claude Sonnet 4 Claude Opus 4.6 +9 more

5Anthropic News·18d ago·source ↗

Anthropic outlines election safety policies and interventions for 2024 global elections

Anthropic published a policy overview describing its three-pronged approach to election-related AI misuse in 2024: enforcing acceptable use policies that prohibit political campaigning and influence operations, red-teaming models for election-specific vulnerabilities including misinformation and voter suppression prompts, and redirecting users asking voting questions to authoritative nonpartisan sources like TurboVote and the European Parliament's elections site. The post was updated in May 2024 to cover EU users following Claude's European launch and to clarify usage policy definitions around political lobbying. The piece reflects Anthropic's cautious stance on generative AI in high-stakes civic contexts, including explicit acknowledgment of hallucination risks for real-time election information.

AI Safety Research Regulatory Developments Claude European Parliament Democracy Works +2 more

6Anthropic News·17d ago·source ↗

Anthropic publishes 2024 election safety retrospective with Clio usage analysis

Anthropic released a post-mortem on AI and elections in 2024, covering their safety policies, red-teaming efforts, and enforcement actions across global elections. Election-related activity constituted less than 0.5% of overall Claude usage, rising to just over 1% around the US election, with approximately 100 enforcement actions globally. The report introduces Clio, an automated tool for analyzing real-world usage patterns, and documents a case study on handling knowledge cutoff limitations during France's snap elections. The piece represents Anthropic's first systematic public accounting of election-related AI safety work at scale.

AI Safety Research Regulatory Developments Claude Sonnet 3.5 Clio Claude Opus 4.6 +4 more

5Anthropic News·17d ago·source ↗

Anthropic updates Usage Policy with election integrity, high-risk use case, and privacy rules

Anthropic revised its Acceptable Use Policy (renamed Usage Policy), effective June 6, 2024, consolidating prohibited-use categories into 'Universal Usage Standards.' Key changes include explicit bans on AI-assisted election interference and political campaigning, new safety requirements for high-risk use cases (healthcare, legal), expanded access for minors via API partners with safety disclosures, and stronger privacy protections including prohibitions on biometric inference and government-directed censorship. The update reflects both evolving regulatory context and Anthropic's stated safety mission.

AI Safety Research Regulatory Developments Anthropic Usage Policy Anthropic

5Anthropic News·19d ago·source ↗

Anthropic publishes structured harm assessment framework covering physical, psychological, economic, and societal impacts

Anthropic has released a policy document describing their evolving framework for assessing and mitigating AI harms across five dimensions: physical, psychological, economic, societal, and individual autonomy impacts. The framework complements their existing Responsible Scaling Policy and informs decisions on usage policies, red-teaming, detection, and enforcement. Concrete examples include safeguards for computer use capabilities (fraud, phishing) and a reported 45% reduction in unnecessary refusals in Claude 3.7 Sonnet through improved handling of ambiguous prompts. Anthropic frames this as a work-in-progress and invites collaboration from the broader AI ecosystem.

AI Safety Research Alignment and RLHF Responsible Scaling Policy Claude 3.7 Sonnet Anthropic

5Anthropic News·19d ago·source ↗

Anthropic Details Claude Safeguards Team Structure and Multi-Layer Safety Approach

Anthropic has published a detailed overview of its internal Safeguards team, describing a multi-layer approach to preventing Claude misuse that spans policy development, model training influence, pre-deployment evaluation, and real-time enforcement. The team uses a Unified Harm Framework covering five dimensions (physical, psychological, economic, societal, autonomy) and conducts Policy Vulnerability Testing with external domain experts in areas like terrorism, child safety, and mental health. Pre-deployment evaluations include safety assessments, CBRNE-focused AI capability uplift testing with government partners, and bias evaluations. The post describes specific partnerships with organizations like the Institute for Strategic Dialogue and ThroughLine to inform election integrity and mental health response policies.

Evaluation and Benchmarking AI Safety Research Anthropic Safeguards Team Anthropic Usage Policy Claude +5 more

7Anthropic News·18d ago·source ↗

Anthropic publishes frontier threats red teaming methodology and biosecurity findings

Anthropic describes its 'frontier threats red teaming' program, sharing methodology and high-level findings from a 150+ hour biosecurity red-teaming project conducted with domain experts. The team found that current frontier models can sometimes produce expert-level biological information, that risks are likely to grow as models scale and gain tool access, and that unmitigated LLMs could accelerate bioweapon-related misuse within two to three years. Mitigations including training-process changes and classifier-based filters have been deployed, and Anthropic is sharing findings with governments and other labs while calling for more independent red-teaming efforts.

Frontier Model Releases AI Safety Research Dario Amodei Constitutional AI Anthropic