Anthropic Publishes Political Even-Handedness Evaluation for Claude, Open-Sources Methodology
Anthropic has released a detailed account of how it trains and evaluates Claude for political even-handedness, including character traits instilled via reinforcement learning since early 2024 and a new automated evaluation methodology. The evaluation tests thousands of prompts across hundreds of political stances and benchmarks Claude Sonnet 4.5 against GPT-5, Llama 4, Grok 4, and Gemini 2.5 Pro, finding Claude comparable to Grok 4 and Gemini 2.5 Pro and more even-handed than GPT-5 and Llama 4. Anthropic is open-sourcing the evaluation framework to encourage shared industry standards for measuring political bias. The post also discloses the specific system prompt language used on Claude.ai to enforce even-handed behavior.
Related guides (4)
Related events (8)
Anthropic Updates Election Safeguards for Claude Ahead of 2026 US Midterms
Anthropic has published an update on its election-related safety measures for Claude, covering political bias evaluations, usage policy enforcement, and influence operation resistance testing. New model versions Claude Opus 4.7 and Sonnet 4.6 scored 95-96% on political impartiality evaluations and handled election-related policy compliance at 99.8-100% on a 600-prompt test suite. For the first time, Anthropic tested whether models can autonomously run influence operations end-to-end, finding that only Mythos Preview and Opus 4.7 completed more than half of tasks when safeguards were removed, underscoring ongoing capability concerns. Anthropic is also deploying election information banners pointing users to nonpartisan resources like TurboVote for the 2026 US midterms.
Anthropic Publishes Updated Claude's Constitution (Jan 2026 Revision)
Anthropic has released an updated version of Claude's Constitution, the explicit set of principles governing Claude's values and behavior under the Constitutional AI (CAI) framework. The post explains how CAI uses AI-generated feedback rather than large-scale human feedback to train models toward helpful, honest, and harmless behavior, with the constitution guiding both self-critique/revision and reinforcement learning phases. The constitution draws from sources including the UN Declaration of Human Rights, DeepMind's Sparrow Principles, Apple's terms of service, and Anthropic's own safety research. Anthropic frames the constitution as a work-in-progress and invites broader participation in designing AI constitutions.
Independent evaluators struggle to benchmark Claude Fable 5 due to Anthropic's safety classifiers and data retention policies
Multiple independent organizations found they could not fully evaluate Claude Fable 5 (the public-facing safeguarded version of Claude Mythos 5) because Anthropic's classifiers silently rerouted flagged prompts to the weaker Claude Opus 4.8 or refused them outright. Evaluators including Artificial Analysis, Vals AI, and ARC Prize Foundation each adopted different scoring strategies — blended, pure, or abstaining entirely — producing widely divergent rankings depending on how refusals were handled. On GPQA Diamond, Claude Fable 5's score swung from 93.18% (2nd place) to 55.56% (94th place) depending on whether refusals were counted as failures. The episode surfaces a structural tension between safety-oriented deployment constraints and the ability of the field to independently measure frontier model capabilities.
Introducing Claude 3.5 Sonnet
Anthropic launches Claude 3.5 Sonnet, the first model in its Claude 3.5 family, claiming it outperforms Claude 3 Opus and competitor models on GPQA, MMLU, and HumanEval benchmarks while operating at twice the speed and mid-tier pricing ($3/$15 per million tokens). The model features a 200K context window, improved vision capabilities, and an internal agentic coding evaluation score of 64% versus 38% for Opus. Alongside the model, Anthropic introduces Artifacts on Claude.ai, a dedicated workspace for real-time editing of AI-generated content. The model was pre-deployment evaluated by the UK AI Safety Institute and assessed at ASL-2.
Anthropic Publishes New Claude Constitution Under CC0 License
Anthropic has released a new foundational 'constitution' document that directly shapes Claude's values and behavior during training, replacing a previous list of standalone principles with a holistic explanatory framework. The document is written primarily for Claude itself, explaining the reasoning behind desired behaviors rather than just specifying rules, with the goal of enabling better generalization to novel situations. It establishes a priority hierarchy: broadly safe, broadly ethical, compliant with Anthropic guidelines, and genuinely helpful. The constitution is released under Creative Commons CC0 1.0, allowing unrestricted use, and plays a central role in generating synthetic training data.
Anthropic launches Claude publicly with two model tiers after closed alpha
Anthropic announced the public launch of Claude on March 14, 2023, following a closed alpha with partners including Notion, Quora, and DuckDuckGo. The release introduced two model variants — Claude (high-performance) and Claude Instant (lighter and faster) — accessible via chat interface and API. Early partners reported Claude produced fewer harmful outputs and was more steerable than competing models, with deployments spanning education, legal tech, productivity, and search.
Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks
Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.
Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku
Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.



