Anthropic partners with U.S. National Labs for 1,000 Scientist AI Jam evaluating Claude on scientific tasks
Anthropic is participating in the U.S. Department of Energy's first 1,000 Scientist AI Jam, bringing together scientists across multiple National Laboratories to evaluate frontier AI models on scientific research and national security applications. Claude 3.7 Sonnet, recently launched as the first hybrid reasoning model, will be a primary subject of evaluation across tasks including hypothesis generation, experiment planning, code generation, and result analysis. This builds on Anthropic's April 2024 collaboration with the National Nuclear Security Administration, which was the first instance of a frontier lab evaluating a model in a Top Secret classified environment. The partnership signals deepening government-industry collaboration on AI for scientific discovery and national security.
Related guides (4)
Related events (8)
Anthropic Partners with US Department of Energy on Genesis Mission for AI-Driven Scientific Discovery
Anthropic and the US Department of Energy have announced a multi-year partnership under the DOE's Genesis Mission initiative, targeting AI deployment across energy, biological sciences, and scientific productivity domains. The partnership will provide DOE researchers access to Claude and Anthropic engineers who will build purpose-built agents, Model Context Protocol servers, and specialized Claude Skills for scientific workflows. The collaboration has potential reach across all 17 US national laboratories and builds on prior work including a nuclear risk classifier with the National Nuclear Security Administration and Claude deployment at Lawrence Livermore. This represents a significant expansion of Anthropic's US government footprint.
Anthropic Partners with Allen Institute and HHMI to Deploy Claude in Frontier Life Sciences Research
Anthropic has announced flagship partnerships with the Allen Institute and Howard Hughes Medical Institute (HHMI) to embed Claude into active scientific workflows at both institutions. HHMI's collaboration, anchored at Janelia Research Campus, focuses on developing specialized AI agents integrated with scientific instruments and analysis pipelines. The Allen Institute partnership targets multi-agent systems for multi-modal biological data analysis, including multi-omic integration, knowledge graph management, and experimental design coordination. Both partnerships emphasize interpretability, researcher autonomy, and transparency, with the stated goal of compressing months of manual analysis while keeping human scientists in control of scientific direction.
Anthropic makes Claude 3 Haiku and Sonnet available to US Intelligence Community and AWS GovCloud
Anthropic has made Claude 3 Haiku and Claude 3 Sonnet available via AWS Marketplace for the US Intelligence Community and AWS GovCloud, marking a significant expansion into government deployment. The company has crafted contractual exceptions to its general Usage Policy to permit legally authorized foreign intelligence analysis, including combating human trafficking and identifying covert influence campaigns, while maintaining restrictions on disinformation, weapons design, and malicious cyber operations. The deployment is currently limited to ASL-2 models under Anthropic's Responsible Scaling Policy. Anthropic also notes prior pre-release access to Claude 3.5 Sonnet was provided to the UK AI Safety Institute for pre-deployment testing.
How scientists are using Claude to accelerate research and discovery
Anthropic describes how researchers are deploying Claude-powered systems across scientific workflows, highlighting three case studies: Biomni (a Stanford agentic platform integrating hundreds of biomedical tools), the Cheeseman Lab (automating large-scale gene knockout experiment interpretation), and others. The piece details Claude for Life Sciences and the AI for Science program, which provides free API credits to high-impact research projects. Specific benchmarks cited include compressing months-long GWAS analyses to 20 minutes and analyzing 336,000 single-cell datasets to identify novel transcription factors.
Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku
Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.
Introducing Claude 3.5 Sonnet
Anthropic launches Claude 3.5 Sonnet, the first model in its Claude 3.5 family, claiming it outperforms Claude 3 Opus and competitor models on GPQA, MMLU, and HumanEval benchmarks while operating at twice the speed and mid-tier pricing ($3/$15 per million tokens). The model features a 200K context window, improved vision capabilities, and an internal agentic coding evaluation score of 64% versus 38% for Opus. Alongside the model, Anthropic introduces Artifacts on Claude.ai, a dedicated workspace for real-time editing of AI-generated content. The model was pre-deployment evaluated by the UK AI Safety Institute and assessed at ASL-2.
Anthropic Frontier Red Team reports early-warning signs of rapid AI progress in cybersecurity and biosecurity capabilities
Anthropic's Frontier Red Team published findings from a year of safety evaluations across four model releases, documenting rapid capability gains in dual-use domains. In cybersecurity, Claude 3.7 Sonnet now solves roughly a third of Cybench CTF challenges (up from ~5% a year ago), and with the Incalmo toolset was able to replicate a large-scale network attack in realistic cyber range environments. In biosecurity, Claude has moved from underperforming virology experts to exceeding them on the VCT benchmark within one year, and exceeds human expert baselines on cloning workflows. Anthropic assesses current models as showing 'early warning' signs but not yet crossing thresholds of substantially elevated national security risk.
Claude 3.7 Sonnet and Claude Code: Anthropic's First Hybrid Reasoning Model and Agentic Coding Tool
Anthropic has released Claude 3.7 Sonnet, described as their most capable model to date and the first hybrid reasoning model on the market, capable of operating in both standard and extended thinking modes within a single unified model. The model achieves state-of-the-art results on SWE-bench Verified and TAU-bench, with particular strength in coding and front-end web development. Alongside the model, Anthropic is launching Claude Code in limited research preview, a command-line agentic coding tool that can read/edit files, run tests, and push to GitHub. Pricing remains unchanged at $3/M input and $15/M output tokens, with availability across Claude.ai plans, Amazon Bedrock, and Google Cloud Vertex AI.



