Almanac
Guide · Beginner

Claude Opus 4.6: Anthropic's Milestone Model for Long-Context and Agentic Work

Claude Opus 4.6Beginneractive·v3 · live·generated 6d ago

Part of these paths

TL;DRClaude Opus 4.6 is the Anthropic model that pushed the Claude line into genuinely long-horizon, autonomous work — tasks an AI runs over many steps without constant hand-holding. It arrived with a dramatically expanded memory window, top scores on the hardest public benchmarks, and a real-world security demonstration that found dozens of serious vulnerabilities in Firefox, setting the stage for Anthropic's broader push into AI-assisted cybersecurity.

Key takeaways

  • A 1M-token context window (in beta) lets it hold the equivalent of a small library in memory during a single session.
  • It claimed first place on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp — outperforming GPT-5.2 by 144 Elo points on GDPval-AA.
  • In a two-week partnership with Mozilla, it found 22 Firefox vulnerabilities, 14 of them high-severity.
  • Pricing held steady at $5/$25 per million input/output tokens — same as its predecessor Claude Opus 4.5.
  • It introduced agent teams in Claude Code and context compaction, making multi-hour autonomous coding sessions practical.
  • It was later used as the foundation for Claude Code Security, a defensive tool that found 500+ previously undetected vulnerabilities in open-source codebases.

What Claude Opus 4.6 is

Claude Opus 4.6 is a large language model made by Anthropic — the company behind the Claude family of AI assistants. Think of it as a very capable AI that can read, write, reason, and write or review code. What made Opus 4.6 a notable step forward was its ability to work on long, complex tasks over extended periods, rather than just answering quick questions.

It was released in March 2026 and succeeded Claude Opus 4.5.

Why it matters — the "why should I care" version

Most AI tools are like a very smart colleague who can only hold a short conversation in their head at once. Claude Opus 4.6 changed that in two ways:

1. A much bigger memory. Its 1M-token context window — available in beta — means it can read and reason over roughly a million words at once. That's enough to hold an entire large codebase, a stack of legal documents, or months of research notes in a single session.

2. Longer attention span for tasks. It introduced "agent teams" inside Claude Code (Anthropic's AI coding tool) and a feature called context compaction, which lets it work on a project for hours without losing track of what it's already done. Previously, AI tools would effectively "forget" earlier work as a session grew long.

What it's good at

Opus 4.6 claimed top scores on four demanding public tests when it launched:

  • Terminal-Bench 2.0 — a test of real-world software engineering in a terminal environment
  • Humanity's Last Exam — a very hard knowledge test across many fields
  • GDPval-AA — a benchmark for general knowledge work, where it beat OpenAI's GPT-5.2 by 144 Elo points (a meaningful gap, like the difference between a strong club chess player and a grandmaster)
  • BrowseComp — a test of web-based research and reasoning

On the AutoLab benchmark — which tests AI agents on sustained, iterative engineering tasks lasting hours — Opus 4.6 stood out as the strongest performer among 17 frontier models tested.

The Firefox security story

One of the most concrete demonstrations of what Opus 4.6 could do came from a two-week partnership with Mozilla in early 2026. Anthropic pointed the model at Firefox's source code — nearly 6,000 files of complex C++ — and let it scan for security problems. It found 22 vulnerabilities, 14 of which Mozilla classified as high-severity. That represented nearly a fifth of all high-severity Firefox vulnerabilities fixed in all of 2025. Most were patched in Firefox 148.0.

This wasn't just a benchmark number — it was a real-world demonstration that an AI could do meaningful security work on production software used by hundreds of millions of people.

How it fits into the bigger picture

Opus 4.6 was the model that prompted Anthropic to take AI-assisted cybersecurity seriously as both an opportunity and a risk. The Firefox results led directly to Claude Code Security — a tool built on Opus 4.6 that scans codebases for vulnerabilities and suggests fixes, released in research preview for enterprise customers. Internal research found it uncovered more than 500 previously undetected vulnerabilities in open-source projects.

It also laid the groundwork for Project Glasswing, Anthropic's initiative to help critical infrastructure organizations patch vulnerabilities before more powerful AI models made those same vulnerabilities easier for attackers to find.

What came after

Claude Opus 4.7 succeeded Opus 4.6 in May 2026, improving on software engineering, vision, and long-horizon tasks, and adding new cybersecurity safeguards developed in response to the lessons from Opus 4.6's security work. Opus 4.6 remained the backbone of Claude Code Security even after 4.7 launched.

The Opus 4.6 generation also coincided with Anthropic's rapid commercial expansion — the company was signing major compute deals with Amazon, Google, Microsoft, and NVIDIA, and raising billions in funding — meaning the model was deployed at significant scale across cloud platforms and developer tools worldwide.

What Claude Opus 4.6 introduced and what it led to

Timeline

  1. Claude Opus 4.6 released with 1M token context and new benchmark highs

  2. Mozilla partnership: 22 Firefox vulnerabilities found in two weeks

  3. Claude Code Security launched, built on Opus 4.6, finds 500+ open-source flaws

  4. Claude Opus 4.7 released, explicitly positioned as improvement over Opus 4.6

Related topics

AnthropicClaude CodeClaude Mythos PreviewProject GlasswingClaudeTerminal-BenchGoogle Cloud Vertex AI

FAQ

What is a 1M token context window, in plain English?

A token is roughly three-quarters of a word. One million tokens is about 750,000 words — enough to fit a large novel, a full software codebase, or months of documents into a single AI session without the model losing track of earlier content.

How is Opus 4.6 different from the Claude I use on claude.ai?

Claude.ai's free and standard plans typically run on lighter, faster models like Sonnet. Opus 4.6 is the heavier, more capable model aimed at complex, long-running tasks — it costs more to run and is accessed via paid plans or the API.

Is Claude Opus 4.6 still the latest Claude model?

No — Claude Opus 4.7 and later models have since been released, each building on what Opus 4.6 established. Opus 4.6 remains notable as the model that introduced the 1M-token context window and demonstrated AI-assisted security research at scale.

Can Claude Opus 4.6 actually hack things?

It can find security vulnerabilities in code — as the Mozilla Firefox partnership showed — but Anthropic built safeguards to prevent it from being used offensively. The Claude Code Security product built on it is designed for defenders, not attackers.

Stay current

Call Me Almanac pairs the week's AI news with guides like this one — Midweek & Sunday.

Versions

  • v3live6d ago
  • v2superseded11d ago
  • v1superseded16d ago

Related guides (4)

More on Claude Opus 4.6 (6)

8Anthropic News·1mo ago·source ↗

Anthropic Releases Claude Opus 4.7 with Enhanced Coding, Vision, and Cyber Safeguards

Anthropic has released Claude Opus 4.7, a general-availability model positioned as a meaningful improvement over Opus 4.6 in advanced software engineering, long-horizon agentic tasks, and vision capabilities including higher image resolution. The model is notably the first to receive new cybersecurity safeguards developed in response to Project Glasswing, with automatic detection and blocking of prohibited cyber uses and a new Cyber Verification Program for legitimate security professionals. Opus 4.7 is available across Claude products, API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same pricing as Opus 4.6 ($5/$25 per million input/output tokens). The release is explicitly positioned below Claude Mythos Preview in overall capability, serving as a testbed for safety mechanisms before broader deployment of Mythos-class models.

8Hacker News·23d ago·source ↗

Claude Opus 4.8 Released by Anthropic

Anthropic has released Claude Opus 4.8, a new frontier model in their Claude lineup. The announcement appeared on Anthropic's official news page and generated significant community engagement on Hacker News with over 1,000 points and 800+ comments. Specific capability details and benchmarks are not available from the source snippet alone.

5Don'T Worry About The Vase·22d ago·source ↗

Claude Opus 4.8: The System Card — Commentary

Zvi Mowshowitz publishes commentary on Claude Opus 4.8, released approximately six weeks after Opus 4.7. The piece appears to analyze the model's system card, suggesting a rapid iteration cadence from Anthropic. As a tier-2 commentary source, this provides analytical perspective on the release rather than primary documentation.

7The Batch·19d ago·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

9Anthropic News·19d ago·source ↗

Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.

9Anthropic News·19d ago·source ↗

Anthropic Introduces Claude Opus 4 and Sonnet 4 with Leading Coding Benchmarks and Agent Capabilities

Anthropic has released Claude Opus 4 and Claude Sonnet 4, positioning Opus 4 as the world's best coding model with 72.5% on SWE-bench and 43.2% on Terminal-bench, and Sonnet 4 at 72.7% on SWE-bench. Both models are hybrid (near-instant + extended thinking), support extended thinking with tool use in beta, parallel tool execution, and improved memory via local file access. Alongside the models, Anthropic is launching Claude Code as generally available with GitHub Actions, VS Code, and JetBrains integrations, plus four new API capabilities: code execution tool, MCP connector, Files API, and one-hour prompt caching. Pricing is unchanged from prior Opus and Sonnet tiers ($15/$75 and $3/$15 per million tokens respectively), with availability on Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI.