6Anthropic News·1mo ago

Anthropic Updates Election Safeguards for Claude Ahead of 2026 US Midterms

Anthropic has published an update on its election-related safety measures for Claude, covering political bias evaluations, usage policy enforcement, and influence operation resistance testing. New model versions Claude Opus 4.7 and Sonnet 4.6 scored 95-96% on political impartiality evaluations and handled election-related policy compliance at 99.8-100% on a 600-prompt test suite. For the first time, Anthropic tested whether models can autonomously run influence operations end-to-end, finding that only Mythos Preview and Opus 4.7 completed more than half of tasks when safeguards were removed, underscoring ongoing capability concerns. Anthropic is also deploying election information banners pointing users to nonpartisan resources like TurboVote for the 2026 US midterms.

Frontier Model Releases Evaluation and Benchmarking AI Safety Research Regulatory Developments Collective Intelligence Project Claude Sonnet 4 Claude Opus 4.6 Mythos Preview Claude's constitution Democracy Works The Future of Free Speech TurboVote Foundation for American Innovation Anthropic

Related guides (4)

Claude Opus 4.6

Claude Opus 4.6: Anthropic's Milestone Model for Long-Context and Agentic Work

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner

AI Safety ResearchTopic guide

AI Safety Research: From Lab Evals to Geopolitical Flashpoint

Read asIn-depth

Related events (8)

6Anthropic News·17d ago·source ↗

Anthropic publishes 2024 election safety retrospective with Clio usage analysis

Anthropic released a post-mortem on AI and elections in 2024, covering their safety policies, red-teaming efforts, and enforcement actions across global elections. Election-related activity constituted less than 0.5% of overall Claude usage, rising to just over 1% around the US election, with approximately 100 enforcement actions globally. The report introduces Clio, an automated tool for analyzing real-world usage patterns, and documents a case study on handling knowledge cutoff limitations during France's snap elections. The piece represents Anthropic's first systematic public accounting of election-related AI safety work at scale.

AI Safety Research Regulatory Developments Claude Sonnet 3.5 Clio Claude Opus 4.6 +4 more

5Anthropic News·18d ago·source ↗

Anthropic outlines election safety policies and interventions for 2024 global elections

Anthropic published a policy overview describing its three-pronged approach to election-related AI misuse in 2024: enforcing acceptable use policies that prohibit political campaigning and influence operations, red-teaming models for election-specific vulnerabilities including misinformation and voter suppression prompts, and redirecting users asking voting questions to authoritative nonpartisan sources like TurboVote and the European Parliament's elections site. The post was updated in May 2024 to cover EU users following Claude's European launch and to clarify usage policy definitions around political lobbying. The piece reflects Anthropic's cautious stance on generative AI in high-stakes civic contexts, including explicit acknowledgment of hallucination risks for real-time election information.

AI Safety Research Regulatory Developments Claude European Parliament Democracy Works +2 more

7Anthropic News·20d ago·source ↗

Anthropic Publishes Political Even-Handedness Evaluation for Claude, Open-Sources Methodology

Anthropic has released a detailed account of how it trains and evaluates Claude for political even-handedness, including character traits instilled via reinforcement learning since early 2024 and a new automated evaluation methodology. The evaluation tests thousands of prompts across hundreds of political stances and benchmarks Claude Sonnet 4.5 against GPT-5, Llama 4, Grok 4, and Gemini 2.5 Pro, finding Claude comparable to Grok 4 and Gemini 2.5 Pro and more even-handed than GPT-5 and Llama 4. Anthropic is open-sourcing the evaluation framework to encourage shared industry standards for measuring political bias. The post also discloses the specific system prompt language used on Claude.ai to enforce even-handed behavior.

Frontier Model Releases Evaluation and Benchmarking claude.ai Claude Sonnet 4.5 Grok 4 +8 more

5Anthropic News·17d ago·source ↗

Anthropic publishes U.S. Elections Readiness summary covering policy, enforcement, and evaluation work

Anthropic released a summary of its election-integrity measures ahead of the November 5, 2024 U.S. elections, covering usage policy prohibitions on political campaigning and misinformation, automated enforcement systems, and red-teaming/vulnerability testing programs. The company implemented a TurboVote redirect for voting-information queries and released some of its automated election-safety evaluations publicly to support industry-wide efforts. The post documents Anthropic's first full election-cycle experience deploying generative AI at scale under explicit safety constraints.

AI Safety Research Regulatory Developments Claude Democracy Works Amazon Web Services +3 more

5Anthropic News·18d ago·source ↗

Anthropic publishes elections-risk testing methodology and releases automated evaluation tools

Anthropic describes its two-stage process for identifying and mitigating elections-related risks in Claude: qualitative 'Policy Vulnerability Testing' (PVT) conducted with external subject matter experts, followed by large-scale automated evaluations. The post details how findings from PVT inform mitigation strategies such as policy updates, model fine-tuning, and response behavior changes, with a case study on election administration accuracy. Anthropic is also releasing some of its automated evaluation tools publicly to help other organizations improve election integrity efforts.

Evaluation and Benchmarking AI Safety Research Isabelle Frances-Wright Claude Policy Vulnerability Testing +3 more

7The Batch·20d ago·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

Frontier Model Releases Evaluation and Benchmarking Unitree Harvey Claude Mythos +16 more

9Anthropic News·18d ago·source ↗

Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.

Frontier Model Releases Evaluation and Benchmarking OpenAI o1-preview Amazon Bedrock Claude 3.5 Sonnet +15 more

5Anthropic News·19d ago·source ↗

Anthropic Details Claude Safeguards Team Structure and Multi-Layer Safety Approach

Anthropic has published a detailed overview of its internal Safeguards team, describing a multi-layer approach to preventing Claude misuse that spans policy development, model training influence, pre-deployment evaluation, and real-time enforcement. The team uses a Unified Harm Framework covering five dimensions (physical, psychological, economic, societal, autonomy) and conducts Policy Vulnerability Testing with external domain experts in areas like terrorism, child safety, and mental health. Pre-deployment evaluations include safety assessments, CBRNE-focused AI capability uplift testing with government partners, and bias evaluations. The post describes specific partnerships with organizations like the Institute for Strategic Dialogue and ThroughLine to inform election integrity and mental health response policies.

Evaluation and Benchmarking AI Safety Research Anthropic Safeguards Team Anthropic Usage Policy Claude +5 more