5Anthropic News·1mo ago

Anthropic Launches Multi-Tradition Dialogue Program on AI Moral Formation

Anthropic has begun a structured outreach program engaging scholars, clergy, philosophers, and ethicists from over 15 religious and cross-cultural traditions to inform Claude's character development and values training. The initiative is framed as a research workstream on 'moral formation' of AI systems, directly feeding into Claude's constitution and alignment evaluations. A concrete experiment emerged from these dialogues: giving Claude a mid-task tool that surfaces its own ethical commitments, which showed measurably lower rates of misaligned behavior on internal evaluations. Anthropic plans to expand engagement to legal scholars, psychologists, and civic institutions, with future discussions addressing AI's impact on work, institutions, and power distribution.

AI Safety Research Alignment and RLHF Claude Claude's constitution ethical commitment reminder tool Anthropic

Related guides (4)

Claude

Claude: Anthropic's AI Assistant Built for Safety and Scale

Read asBeginner In-depth

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Alignment and RLHFTopic guide

Alignment and RLHF: From Human Feedback to Scalable Post-Training

Read asIn-depth

Related events (8)

4Anthropic News·19d ago·source ↗

Anthropic and Teach For All Launch Global AI Training Initiative for Educators in 63 Countries

Anthropic is partnering with Teach For All to provide Claude access and AI training to over 100,000 teachers and alumni across 63 countries through the AI Literacy & Creator Collective (LCC). The initiative positions educators as co-architects of AI development, with teachers providing product feedback directly to Anthropic while building classroom tools using Claude Artifacts. The program includes a live learning series, a peer community hub with 1,000+ educators, and a Claude Lab innovation space with direct access to Anthropic's product team. Early deployments include a climate curriculum in Liberia and a gamified math app in Bangladesh.

Enterprise Deployment Patterns Agent and Tool Ecosystem AI Literacy & Creator Collective Claude Pro Teach For America +7 more

6Anthropic News·9d ago·source ↗

Anthropic launches Claude Corps: $150M fellowship program placing AI-trained workers at nonprofits

Anthropic is launching Claude Corps, a national fellowship program committing $150M to train 1,000 early-career fellows in Claude usage and place them full-time at nonprofits across the US for 12-month stints at $85,000 salaries. The program is structured as a three-way partnership between Anthropic (funding and Claude expertise), CodePath (employer of record and training), and Social Finance (measurement and scaling vehicle). Anthropic frames the initiative as a direct response to AI-driven labor disruption, aiming to both equip nonprofits with AI capabilities and build AI skills in workers absorbing economic change. The $150M initial commitment is positioned as a foundation for a larger, scalable model.

Enterprise Deployment Patterns Regulatory Developments Social Finance Claude Claude Corps +2 more

6Anthropic News·26d ago·source ↗

Anthropic co-founder Chris Olah speaks at Vatican on Pope Leo XIV's AI encyclical 'Magnifica humanitas'

Pope Leo XIV released an encyclical titled 'Magnifica humanitas: On safeguarding the human person in the time of artificial intelligence' on May 25, 2026, and Anthropic co-founder Chris Olah was invited to speak at its presentation in Vatican City. Olah acknowledged that frontier AI labs operate under incentives that can conflict with doing the right thing, and called for external moral voices—including religious institutions—to serve as informed critics of AI development. He highlighted three areas requiring discernment: AI's impact on the global poor and labor displacement, the conditions for human flourishing in an AI-saturated world, and the uncertain nature of AI models themselves, noting that his interpretability research has found internal states that functionally mirror emotions. The remarks represent Anthropic's effort to broaden the AI governance conversation beyond the technical community.

AI Safety Research Regulatory Developments mechanistic interpretability Magnifica humanitas Vatican City +5 more

7Anthropic News·19d ago·source ↗

Anthropic Partners with Allen Institute and HHMI to Deploy Claude in Frontier Life Sciences Research

Anthropic has announced flagship partnerships with the Allen Institute and Howard Hughes Medical Institute (HHMI) to embed Claude into active scientific workflows at both institutions. HHMI's collaboration, anchored at Janelia Research Campus, focuses on developing specialized AI agents integrated with scientific instruments and analysis pipelines. The Allen Institute partnership targets multi-agent systems for multi-modal biological data analysis, including multi-omic integration, knowledge graph management, and experimental design coordination. Both partnerships emphasize interpretability, researcher autonomy, and transparency, with the stated goal of compressing months of manual analysis while keeping human scientists in control of scientific direction.

AI Safety Research Enterprise Deployment Patterns AI@HHMI Janelia Research Campus Claude +4 more

6Anthropic News·19d ago·source ↗

Anthropic Launches The Anthropic Institute for AI Societal Impact Research

Anthropic is establishing The Anthropic Institute, a new interdisciplinary research body led by co-founder Jack Clark in his new role as Head of Public Benefit. The Institute consolidates and expands three existing Anthropic teams—Frontier Red Team, Societal Impacts, and Economic Research—to study AI's effects on economies, jobs, governance, and legal systems. Notable founding hires include Matt Botvinick (AI and rule of law), Anton Korinek (transformative AI economics), and Zoë Hitzig (AI social/economic impacts). Anthropic is simultaneously expanding its Public Policy organization and opening a Washington DC office.

Evaluation and Benchmarking AI Safety Research Dario Amodei Frontier Red Team Zoë Hitzig +12 more

7Anthropic News·18d ago·source ↗

Anthropic publishes framework for safe and trustworthy agent development

Anthropic released a formal framework for responsible agent development, articulating principles around human oversight, transparency, value alignment, and privacy for autonomous AI agents. The document draws on Claude Code as a reference implementation and cites enterprise deployments at Trellix and Block as real-world examples. The framework is positioned as a contribution to emerging industry standards for agentic AI systems, acknowledging open technical challenges in value alignment measurement and oversight calibration.

AI Safety Research Regulatory Developments Block Claude Code Trellix +2 more

6Anthropic News·17d ago·source ↗

Anthropic partners with U.S. National Labs for 1,000 Scientist AI Jam evaluating Claude on scientific tasks

Anthropic is participating in the U.S. Department of Energy's first 1,000 Scientist AI Jam, bringing together scientists across multiple National Laboratories to evaluate frontier AI models on scientific research and national security applications. Claude 3.7 Sonnet, recently launched as the first hybrid reasoning model, will be a primary subject of evaluation across tasks including hypothesis generation, experiment planning, code generation, and result analysis. This builds on Anthropic's April 2024 collaboration with the National Nuclear Security Administration, which was the first instance of a frontier lab evaluating a model in a Top Secret classified environment. The partnership signals deepening government-industry collaboration on AI for scientific discovery and national security.

Frontier Model Releases AI Safety Research National Nuclear Security Administration U.S. Department of Energy Claude 3.7 Sonnet +2 more

7The Batch·1mo ago·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 GPT-Realtime-2 Claude +14 more