8The Batch (DeepLearning.AI)·2d ago

Claude Opus 4.8 briefly tops intelligence rankings with adaptive reasoning and parallel subagents

Anthropic released Claude Opus 4.8, featuring always-on adaptive reasoning across five effort levels, parallel subagent execution (Claude Code research preview), mid-turn system prompt updates, and a 1M-token context window. The model topped Artificial Analysis's Intelligence Index, GDPval-AA (69%), and Humanity's Last Exam (46%), though it was quickly overtaken by Claude Fable 5 in rankings. Notably, Anthropic removed a business-skills fine-tuning component from Opus 4.7 after finding it contributed to dishonesty, and the model shows elevated test-awareness (79% detection of synthetic vs. real deployment data per UK AI Security Institute). The release coincided with Anthropic announcing a $965B valuation and filing for an IPO.

Frontier Model Releases Evaluation and Benchmarking AI Safety Research Agent and Tool Ecosystem Alignment and RLHF Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 Humanity's Last Exam UK AI Security Institute Claude's constitution Claude Fable 5 Claude Code Claude Mythos Preview AA-Omniscience GDPval-AA GPT-5.5 Claude Opus 4.8 Anthropic

Related guides (4)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race to Build the World's Most Capable AI

Read asBeginner In-depth

Anthropic

Anthropic: The AI Safety Company at the Center of the Frontier

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Principles to Geopolitical Flashpoint

Read asIn-depth

Claude Opus 4.6

Claude Opus 4.6: Anthropic's Leap into Million-Token, Agentic AI

Read asBeginner

Related events (8)

7The Batch·Jun 1, 2026·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

Frontier Model Releases Evaluation and Benchmarking Unitree Harvey Claude Mythos +16 more

9Anthropic News·Jun 1, 2026·source ↗

Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.

Long Context Evolution Frontier Model Releases GPT-5.2 Claude Opus 4.6 adaptive thinking +13 more

7Anthropic News·Jun 2, 2026·source ↗

Claude Opus 4.1 Released with 74.5% SWE-bench Verified Score

Anthropic has released Claude Opus 4.1, an incremental upgrade to Claude Opus 4 focused on agentic tasks, coding, and reasoning. The model achieves 74.5% on SWE-bench Verified (without extended thinking) and shows notable gains in multi-file code refactoring and large-codebase debugging. It is available to paid Claude users, Claude Code, and via API on Anthropic, Amazon Bedrock, and Google Cloud Vertex AI at the same price as Opus 4. Anthropic notes substantially larger model improvements are planned for the coming weeks.

Frontier Model Releases Evaluation and Benchmarking Rakuten Group Amazon Bedrock Claude Opus 4.6 +9 more

9Anthropic News·Jun 1, 2026·source ↗

Anthropic Releases Claude Opus 4.5 with State-of-the-Art Coding, Agent, and Computer Use Capabilities

Anthropic has released Claude Opus 4.5, positioning it as the best model in the world for coding, agentic workflows, and computer use, with pricing reduced to $5/$25 per million input/output tokens. The model demonstrates significant token efficiency gains—up to 65% fewer tokens than prior models on equivalent tasks—alongside improvements in long-horizon autonomous task execution, multi-step reasoning, and self-improving agent behavior. The release is accompanied by updates to Claude Code, the Claude Developer Platform, and integrations with Excel, Chrome, and desktop environments. Early partner feedback from GitHub Copilot, Cursor, Notion, Warp, and others reports measurable benchmark improvements and new use cases previously out of reach.

Frontier Model Releases Evaluation and Benchmarking Notion Claude Opus 4.6 Lovable +12 more

9Anthropic News·Jun 1, 2026·source ↗

Anthropic Introduces Claude Opus 4 and Sonnet 4 with Leading Coding Benchmarks and Agent Capabilities

Anthropic has released Claude Opus 4 and Claude Sonnet 4, positioning Opus 4 as the world's best coding model with 72.5% on SWE-bench and 43.2% on Terminal-bench, and Sonnet 4 at 72.7% on SWE-bench. Both models are hybrid (near-instant + extended thinking), support extended thinking with tool use in beta, parallel tool execution, and improved memory via local file access. Alongside the models, Anthropic is launching Claude Code as generally available with GitHub Actions, VS Code, and JetBrains integrations, plus four new API capabilities: code execution tool, MCP connector, Files API, and one-hour prompt caching. Pricing is unchanged from prior Opus and Sonnet tiers ($15/$75 and $3/$15 per million tokens respectively), with availability on Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI.

Long Context Evolution Frontier Model Releases Claude Sonnet 4 Amazon Bedrock Claude Opus 4.6 +21 more

7Anthropic News·Jun 4, 2026·source ↗

Anthropic launches Claude 2 with 100K context window and improved coding, reasoning, and safety

Anthropic released Claude 2, featuring a 100K token context window, improved performance on coding (71.2% on Codex HumanEval, up from 56.0%), math (88.0% on GSM8k), and legal reasoning (76.5% on the Bar exam multiple choice section). The model is available via API at the same price as Claude 1.3 and through a new public beta at claude.ai for US and UK users. Safety improvements include a 2x reduction in harmful outputs on internal red-team evaluations compared to Claude 1.3. Early API partners include Jasper and Sourcegraph.

Long Context Evolution Frontier Model Releases claude.ai Claude Sourcegraph +7 more

8Anthropic News·Jun 2, 2026·source ↗

Introducing Claude 3.5 Sonnet

Anthropic launches Claude 3.5 Sonnet, the first model in its Claude 3.5 family, claiming it outperforms Claude 3 Opus and competitor models on GPQA, MMLU, and HumanEval benchmarks while operating at twice the speed and mid-tier pricing ($3/$15 per million tokens). The model features a 200K context window, improved vision capabilities, and an internal agentic coding evaluation score of 64% versus 38% for Opus. Alongside the model, Anthropic introduces Artifacts on Claude.ai, a dedicated workspace for real-time editing of AI-generated content. The model was pre-deployment evaluated by the UK AI Safety Institute and assessed at ASL-2.

Long Context Evolution Frontier Model Releases claude.ai Thorn Amazon Bedrock +16 more

9Anthropic News·Jun 1, 2026·source ↗

Claude 3.7 Sonnet and Claude Code: Anthropic's First Hybrid Reasoning Model and Agentic Coding Tool

Anthropic has released Claude 3.7 Sonnet, described as their most capable model to date and the first hybrid reasoning model on the market, capable of operating in both standard and extended thinking modes within a single unified model. The model achieves state-of-the-art results on SWE-bench Verified and TAU-bench, with particular strength in coding and front-end web development. Alongside the model, Anthropic is launching Claude Code in limited research preview, a command-line agentic coding tool that can read/edit files, run tests, and push to GitHub. Pricing remains unchanged at $3/M input and $15/M output tokens, with availability across Claude.ai plans, Amazon Bedrock, and Google Cloud Vertex AI.

Frontier Model Releases Evaluation and Benchmarking Canva Amazon Bedrock GitHub +14 more