Almanac
Guide · In-depth

Claude Opus 4.6: Anthropic's Long-Context Agentic Frontier Model

Claude Opus 4.6In-depthactive·v2 · live·generated 6d ago

Part of these paths

TL;DRClaude Opus 4.6 is the model that pushed Anthropic's Opus line into long-horizon agentic territory — pairing a 1M-token context window with adaptive reasoning and multi-agent orchestration. It established a new benchmark ceiling at its release, demonstrated real-world offensive security capability against Firefox, and served as the foundation for a cascade of successor models and safety-tiered deployments that followed.

Key takeaways

  • 1M-token context window (beta) with developer-controlled adaptive thinking effort and context compaction for tasks that exceed even that limit.
  • Claimed top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA (+144 Elo over GPT-5.2), and BrowseComp at launch.
  • Pricing held at $5/$25 per million input/output tokens — same as its predecessor Claude Opus 4.5.
  • In a two-week Mozilla partnership, Opus 4.6 identified 22 Firefox vulnerabilities, 14 classified high-severity, scanning ~6,000 C++ files and filing 112 unique reports.
  • Topped the AutoLab benchmark of 36 ultra long-horizon research and engineering tasks, where persistence under wall-clock budgets — not initial attempt quality — was the dominant success predictor.
  • Directly preceded Claude Mythos Preview, which substantially outperformed Opus 4.6 and triggered the Project Glasswing cybersecurity consortium.

What it is

Claude Opus 4.6 is Anthropic's flagship large language model released in March 2026, succeeding Claude Opus 4.5 in the Opus line. Its defining additions are a 1M-token context window (in beta), adaptive thinking with developer-controlled effort levels, and a suite of agentic features — agent teams in Claude Code, context compaction for tasks that overflow even the extended window — designed to make long-horizon, multi-step autonomous work practical rather than theoretical.

Benchmark position at launch

At release, Opus 4.6 claimed first place on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA (by 144 Elo over GPT-5.2), and BrowseComp. Its lineage matters for context: Claude Opus 4 had established 72.5% on SWE-bench and 43.2% on Terminal-bench, and Claude Opus 4.5 had nearly saturated CyberGym — the internal security benchmark that prompted Anthropic to test against harder real-world targets. Opus 4.6 extended those gains while holding pricing flat at $5/$25 per million input/output tokens.

GPT-5.4, released two days after Opus 4.6, subsequently leapfrogged it on most benchmarks, and Claude Mythos Preview — published without commercial availability — substantially outperformed Opus 4.6 across CyberGym (83.1%), Terminal-Bench 2.0 (82%), GPQA Diamond (94.5%), and HLE (64.7%). Opus 4.6's benchmark lead was therefore short-lived in absolute terms, though it remained the strongest commercially available Anthropic model for several weeks.

Architecture and capabilities

The events bundle does not disclose internal architecture. Externally observable capabilities include:

  • 1M-token context window (beta) with context compaction for graceful overflow handling
  • Adaptive thinking: developer-controlled effort levels that trade latency and compute against reasoning depth per call
  • Agent teams in Claude Code: coordinated multi-agent orchestration for large-scale engineering tasks
  • Parallel tool execution and local file access for persistent memory across sessions (inherited from the Opus 4 line)

The AutoLab benchmark — 36 expert-curated tasks across system optimization, puzzle-solving, model development, and CUDA kernel optimization, evaluated under wall-clock budgets — found Opus 4.6 the strongest performer across 17 frontier models. The benchmark's key finding is that persistence in iterative feedback loops, not initial attempt quality, predicts success; Opus 4.6 stood out precisely on that dimension.

Real-world security capability

The most consequential capability demonstration came from a two-week partnership with Mozilla in February 2026. Claude Opus 4.6 scanned approximately 6,000 C++ files in the Firefox codebase, submitted 112 unique vulnerability reports, and identified 22 vulnerabilities — 14 of which Mozilla classified as high-severity. That figure represented nearly a fifth of all high-severity Firefox vulnerabilities remediated in 2025. The collaboration also included evaluation of Claude's ability to write primitive exploits, probing the upper limits of AI-enabled offensive security.

This real-world result directly informed Anthropic's decision to publish a model card for Claude Mythos Preview without commercial release, and to assemble Project Glasswing — a consortium of 40+ organizations including AWS, Apple, Google, Microsoft, and CrowdStrike, funded with $100M in API credits — to patch vulnerabilities proactively before Mythos-class capabilities became widely available.

Ecosystem integration

Opus 4.6 is distributed via the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Claude Code — generally available since the Opus 4 launch — uses Opus 4.6 as its primary model for autonomous coding sessions, with integrations across GitHub Actions, VS Code, and JetBrains. Claude Code Security, released in limited research preview in February 2026, is explicitly built on Opus 4.6 and found over 500 previously undetected vulnerabilities in production open-source codebases during internal research.

The compute infrastructure underpinning Opus 4.6's deployment includes Anthropic's 10-year, $100B+ Amazon Bedrock commitment (up to 5 GW on Trainium chips), a $30B Azure compute deal with Microsoft, and access to NVIDIA Grace Blackwell and Vera Rubin systems — all of which also support rate-limit expansions for Claude Code and API users.

Safety posture

Opus 4.6 sits within Anthropic's ASL-3 deployment framework, first activated for Claude Opus 4. ASL-3 measures include Constitutional Classifiers blocking end-to-end CBRN weapon development workflows and enhanced model-weight security. Separately, research using Opus 4.6 as the untrusted model in AI control experiments (BashArena benchmark) found that resampling strategies — drawing five samples per step and auditing on maximum suspicion score — raised safety from 61% to 71% at a 0.3% audit budget, with retrying shown to backfire by leaking monitor rationale to the model.

Successor trajectory

Opus 4.6 was followed by Opus 4.7 (which added Project Glasswing cybersecurity safeguards and a Cyber Verification Program for legitimate security professionals), Opus 4.8 (which improved uncertainty flagging and introduced dynamic parallel subagent workflows), and eventually the Mythos-class models. Each successor was explicitly positioned relative to Opus 4.6 as a baseline — making it the reference point for the capability escalation that defined Anthropic's mid-2026 model roadmap.

Claude Opus 4.6: capability lineage and deployment footprint

Opus 4.6 in the Claude Opus lineage and against key rivals

ModelContext windowKey benchmark resultPricing (input/output per M tokens)Notable
Claude Opus 4200K72.5% SWE-bench, 43.2% Terminal-bench$15 / $75Hybrid thinking, parallel tools; first ASL-3 deployment
Claude Opus 4.5200KNear-saturated CyberGym; best-in-class coding at launch$5 / $2565% token efficiency gain; computer use
Claude Opus 4.61M (beta)SOTA Terminal-Bench 2.0, HLE, GDPval-AA (+144 Elo vs GPT-5.2), BrowseComp$5 / $25Adaptive effort, agent teams, context compaction
Claude Opus 4.7Leads Vals AI Finance Agent benchmark at 64.37%$5 / $25First model with Project Glasswing cybersecurity safeguards
GPT-5.2Trailed Opus 4.6 by 144 Elo on GDPval-AA
GPT-5.41.05MLeapfrogged Opus 4.6 on most benchmarks post-release$30 / $180 (Pro)Native computer use, tool search

All figures from the events bundle; unknown cells render —. GPT-5.4 released after Opus 4.6 and is included for competitive context.

Timeline

  1. Claude Code Security (built on Opus 4.6) released in limited preview, finding 500+ vulnerabilities in open-source codebases

  2. Claude Opus 4.6 released with 1M-token context, adaptive thinking, and agent teams in Claude Code

  3. Mozilla partnership results published: Opus 4.6 found 22 Firefox vulnerabilities (14 high-severity) in two weeks

  4. Claude Mythos Preview model card published — substantially outperforms Opus 4.6; Project Glasswing consortium formed

  5. Claude Opus 4.7 released, explicitly positioned below Mythos Preview; first model with new cybersecurity safeguards

  6. AutoLab benchmark published: Opus 4.6 is strongest performer across 17 frontier models on ultra long-horizon tasks

Related topics

AnthropicClaude CodeClaude Mythos PreviewProject GlasswingClaudeGoogle Cloud Vertex AIOpenAITerminal-Bench

FAQ

How does Opus 4.6 handle inputs longer than 1M tokens?

Context compaction is built into the release — the model compresses earlier context to sustain long-running agentic tasks that would otherwise overflow even the 1M-token window.

What is adaptive thinking and how does a developer control it?

Adaptive thinking lets the model scale its reasoning effort per query; developers set the effort level via API, trading latency and cost against depth of reasoning on a per-call basis.

Is Opus 4.6 still the top Anthropic model?

No — it was succeeded by Opus 4.7 (which added cybersecurity safeguards) and then Opus 4.8, and sits below the Mythos-class models that Anthropic published a model card for without commercial release.

What made the Mozilla Firefox collaboration significant?

It was a real-world demonstration of Opus 4.6's offensive security capability: 22 vulnerabilities found in two weeks across ~6,000 C++ files, with 14 rated high-severity — nearly a fifth of all high-severity Firefox vulnerabilities remediated in 2025.

Where can Opus 4.6 be accessed?

Via the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, as well as Claude Code for autonomous coding workflows.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

  • v2live6d ago
  • v1rejected16d ago

Related guides (4)

More on Claude Opus 4.6 (6)

8Anthropic News·1mo ago·source ↗

Anthropic Releases Claude Opus 4.7 with Enhanced Coding, Vision, and Cyber Safeguards

Anthropic has released Claude Opus 4.7, a general-availability model positioned as a meaningful improvement over Opus 4.6 in advanced software engineering, long-horizon agentic tasks, and vision capabilities including higher image resolution. The model is notably the first to receive new cybersecurity safeguards developed in response to Project Glasswing, with automatic detection and blocking of prohibited cyber uses and a new Cyber Verification Program for legitimate security professionals. Opus 4.7 is available across Claude products, API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same pricing as Opus 4.6 ($5/$25 per million input/output tokens). The release is explicitly positioned below Claude Mythos Preview in overall capability, serving as a testbed for safety mechanisms before broader deployment of Mythos-class models.

8Hacker News·23d ago·source ↗

Claude Opus 4.8 Released by Anthropic

Anthropic has released Claude Opus 4.8, a new frontier model in their Claude lineup. The announcement appeared on Anthropic's official news page and generated significant community engagement on Hacker News with over 1,000 points and 800+ comments. Specific capability details and benchmarks are not available from the source snippet alone.

5Don'T Worry About The Vase·22d ago·source ↗

Claude Opus 4.8: The System Card — Commentary

Zvi Mowshowitz publishes commentary on Claude Opus 4.8, released approximately six weeks after Opus 4.7. The piece appears to analyze the model's system card, suggesting a rapid iteration cadence from Anthropic. As a tier-2 commentary source, this provides analytical perspective on the release rather than primary documentation.

7The Batch·19d ago·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

9Anthropic News·19d ago·source ↗

Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.

9Anthropic News·19d ago·source ↗

Anthropic Introduces Claude Opus 4 and Sonnet 4 with Leading Coding Benchmarks and Agent Capabilities

Anthropic has released Claude Opus 4 and Claude Sonnet 4, positioning Opus 4 as the world's best coding model with 72.5% on SWE-bench and 43.2% on Terminal-bench, and Sonnet 4 at 72.7% on SWE-bench. Both models are hybrid (near-instant + extended thinking), support extended thinking with tool use in beta, parallel tool execution, and improved memory via local file access. Alongside the models, Anthropic is launching Claude Code as generally available with GitHub Actions, VS Code, and JetBrains integrations, plus four new API capabilities: code execution tool, MCP connector, Files API, and one-hour prompt caching. Pricing is unchanged from prior Opus and Sonnet tiers ($15/$75 and $3/$15 per million tokens respectively), with availability on Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI.