Almanac
← Events
5Anthropic News·17d ago

Anthropic releases Claude Instant 1.2 with improved math, coding, and safety

Anthropic released Claude Instant 1.2, an updated version of its faster, lower-cost model tier, now available via API. The release incorporates capabilities from Claude 2 and shows measurable benchmark gains: 58.7% on Codex (vs 52.8% for 1.1) and 86.7% on GSM8K (vs 80.9% for 1.1). Safety improvements include reduced hallucination and greater jailbreak resistance as measured by automated red-teaming.

Related guides (3)

Related events (8)

7Anthropic News·16d ago·source ↗

Anthropic launches Claude 2 with 100K context window and improved coding, reasoning, and safety

Anthropic released Claude 2, featuring a 100K token context window, improved performance on coding (71.2% on Codex HumanEval, up from 56.0%), math (88.0% on GSM8k), and legal reasoning (76.5% on the Bar exam multiple choice section). The model is available via API at the same price as Claude 1.3 and through a new public beta at claude.ai for US and UK users. Safety improvements include a 2x reduction in harmful outputs on internal red-team evaluations compared to Claude 1.3. Early API partners include Jasper and Sourcegraph.

7Anthropic News·16d ago·source ↗

Anthropic releases Claude 2.1 with 200K context window, reduced hallucinations, and tool use beta

Anthropic released Claude 2.1, featuring an industry-first 200,000-token context window (roughly 500 pages), a claimed 2x reduction in hallucination rates versus Claude 2.0, and a new beta tool-use capability allowing Claude to orchestrate across developer-defined APIs and functions. The release also introduces system prompts and a revamped developer Workbench console. Claude 2.1 is available via API and powers claude.ai for both free and Pro tiers, with the 200K context window reserved for Pro users.

7Anthropic News·17d ago·source ↗

Anthropic launches Claude publicly with two model tiers after closed alpha

Anthropic announced the public launch of Claude on March 14, 2023, following a closed alpha with partners including Notion, Quora, and DuckDuckGo. The release introduced two model variants — Claude (high-performance) and Claude Instant (lighter and faster) — accessible via chat interface and API. Early partners reported Claude produced fewer harmful outputs and was more steerable than competing models, with deployments spanning education, legal tech, productivity, and search.

9Anthropic News·19d ago·source ↗

Anthropic Releases Claude Opus 4.5 with State-of-the-Art Coding, Agent, and Computer Use Capabilities

Anthropic has released Claude Opus 4.5, positioning it as the best model in the world for coding, agentic workflows, and computer use, with pricing reduced to $5/$25 per million input/output tokens. The model demonstrates significant token efficiency gains—up to 65% fewer tokens than prior models on equivalent tasks—alongside improvements in long-horizon autonomous task execution, multi-step reasoning, and self-improving agent behavior. The release is accompanied by updates to Claude Code, the Claude Developer Platform, and integrations with Excel, Chrome, and desktop environments. Early partner feedback from GitHub Copilot, Cursor, Notion, Warp, and others reports measurable benchmark improvements and new use cases previously out of reach.

9The Batch·8d ago·source ↗

Anthropic releases Claude Mythos 5 and Claude Fable 5 with unprecedented capability restrictions and safety tiers

Anthropic launched Claude Mythos 5, a restricted-access model capable of cracking previously secure software, and Claude Fable 5, a general-use version with novel safety classifiers that block or degrade responses on cybersecurity, biology, chemistry, and AI-development topics. Both models set new state-of-the-art results across software engineering, agentic coding, knowledge work, and scientific reasoning benchmarks, and are priced at roughly half the cost of the prior Claude Mythos Preview. Claude Fable 5 initially included undisclosed capability degradation for AI-development prompts — applied silently via prompt modification or steering vectors — which sparked controversy before Anthropic modified the policy. The release represents a significant escalation in both frontier capability and the operational complexity of safety-tiered model deployment.

7Anthropic News·18d ago·source ↗

Claude Opus 4.1 Released with 74.5% SWE-bench Verified Score

Anthropic has released Claude Opus 4.1, an incremental upgrade to Claude Opus 4 focused on agentic tasks, coding, and reasoning. The model achieves 74.5% on SWE-bench Verified (without extended thinking) and shows notable gains in multi-file code refactoring and large-codebase debugging. It is available to paid Claude users, Claude Code, and via API on Anthropic, Amazon Bedrock, and Google Cloud Vertex AI at the same price as Opus 4. Anthropic notes substantially larger model improvements are planned for the coming weeks.

8Hacker News·23d ago·source ↗

Claude Opus 4.8 Released by Anthropic

Anthropic has released Claude Opus 4.8, a new frontier model in their Claude lineup. The announcement appeared on Anthropic's official news page and generated significant community engagement on Hacker News with over 1,000 points and 800+ comments. Specific capability details and benchmarks are not available from the source snippet alone.

9Anthropic News·19d ago·source ↗

Anthropic Releases Claude Sonnet 4.5: Top Coding and Computer-Use Model with Agent SDK

Anthropic has released Claude Sonnet 4.5, claiming it is the best coding model and strongest model for building complex agents, with a 61.4% score on OSWorld (up from 42.2% for Sonnet 4) and state-of-the-art performance on SWE-bench Verified. The release is accompanied by major product upgrades including checkpoints in Claude Code, a native VS Code extension, a Claude Agent SDK giving developers access to the same infrastructure powering Claude Code, and new context editing and memory tools in the Claude API. Pricing is unchanged from Sonnet 4 at $3/$15 per million input/output tokens. Early enterprise customers including Cursor, GitHub Copilot, Devin, Canva, and Figma report significant gains in coding, agentic, and long-context tasks.