7The Batch (DeepLearning.AI)·10d ago

The Batch: Claude Mythos 5 / Fable 5 debut, Apple AFM 3, Google Live Translate, OpenAI IPO filing, FrontierCode benchmark

Anthropic launched Claude Fable 5 (a safety-guardrailed model) and Claude Mythos 5 (same underlying model with safeguards removed, for vetted cyberdefense/infrastructure users via Project Glasswing with US government collaboration), both priced at $10/$50 per million tokens. Apple released five new Apple Foundation Models (AFM 3) spanning on-device and cloud tiers, built with Google and Nvidia infrastructure. Additional headlines cover Google's Gemini 3.5 Live Translate (70+ languages, real-time), OpenAI's confidential SEC IPO filing, a NotebookLM upgrade to Gemini 3.5, and Cognition's FrontierCode benchmark for code-quality evaluation where Claude Opus 4.8 leads at 34.3%.

Related guides (5)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner

Claude Opus 4.6

Claude Opus 4.6: Anthropic's Milestone Model for Long-Context and Agentic Work

Read asBeginner

GPT-5.5

GPT-5.5: OpenAI's Benchmark-Leading Agentic Model with a Hallucination Problem

Read asIn-depth

NVIDIA

NVIDIA: The Hardware Backbone of the AI Era

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race from GPT-3 to Safety-Tiered Superintelligence

Read asIn-depth

Related events (8)

8The Batch·8d ago·source ↗

Anthropic launches Claude Mythos 5 and Claude Fable 5; Andrew Ng introduces OpenCoworker desktop agent

Anthropic released Claude Mythos 5 and Claude Fable 5, two variants of the same frontier model that set new state-of-the-art results across software engineering, knowledge work, cybersecurity, and agentic coding benchmarks. Claude Fable 5 is the general-availability version with safety classifiers that restrict responses on security, biology, chemistry, and cutting-edge AI topics, priced at $10/$50 per million input/output tokens; Mythos 5 is restricted to selected partners via Project Glasswing. Separately, Andrew Ng and collaborators released OpenCoworker, a free open-source desktop agent harness built on top of aisuite, designed to give users privacy-preserving agentic workflows with their own API keys or local models. The newsletter also contextualizes the broader shift toward LLM-driven agent harnesses as frontier models have become capable enough to reliably drive next-action decisions.

Frontier Model Releases AI Safety Research Ollama DeepLearning.AI Claude Mythos +13 more

8The Batch·19d ago·source ↗

Anthropic Releases Claude Mythos Preview with Extraordinary Cybersecurity Capabilities, Forms Project Glasswing Consortium

Anthropic has published a 244-page model card for Claude Mythos Preview, a large language model not yet commercially available, which broadly outperforms Claude Opus 4.6 and is described as 'strikingly capable' at identifying and exploiting code vulnerabilities. To mitigate risks before potential release, Anthropic assembled Project Glasswing, a consortium including AWS, Apple, Google, Microsoft, CrowdStrike, Nvidia, and 40+ other organizations, funded with $100 million in API credits and $4 million in open-source security donations. This marks the first time Anthropic has published a model card without making the model commercially available, signaling an unusual safety-first deployment posture. The issue also includes commentary from Andrew Ng on AI's impact on software engineering jobs, arguing against an 'AI jobpocalypse' narrative.

Frontier Model Releases AI Safety Research JPMorganChase Linux Foundation Claude Opus 4.6 +14 more

9The Batch·8d ago·source ↗

Anthropic releases Claude Mythos 5 and Claude Fable 5 with unprecedented capability restrictions and safety tiers

Anthropic launched Claude Mythos 5, a restricted-access model capable of cracking previously secure software, and Claude Fable 5, a general-use version with novel safety classifiers that block or degrade responses on cybersecurity, biology, chemistry, and AI-development topics. Both models set new state-of-the-art results across software engineering, agentic coding, knowledge work, and scientific reasoning benchmarks, and are priced at roughly half the cost of the prior Claude Mythos Preview. Claude Fable 5 initially included undisclosed capability degradation for AI-development prompts — applied silently via prompt modification or steering vectors — which sparked controversy before Anthropic modified the policy. The release represents a significant escalation in both frontier capability and the operational complexity of safety-tiered model deployment.

Frontier Model Releases Evaluation and Benchmarking Claude Mythos Artificial Analysis Intelligence Index Claude Opus 4.6 +9 more

8The Batch·18d ago·source ↗

Claude Mythos Preview: Limited-Release Frontier Model with Exceptional Cybersecurity Capabilities

Anthropic has published a 244-page model card for Claude Mythos Preview, a frontier model not yet commercially available, which autonomously discovered thousands of high-severity vulnerabilities in popular operating systems and browsers during testing. To mitigate risks before potential deployment, Anthropic assembled Project Glasswing, a consortium of over 40 organizations including AWS, Apple, Google, Microsoft, and CrowdStrike, funded with $100M in model credits to patch vulnerabilities proactively. The model substantially outperforms Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro across multiple benchmarks including CyberGym (83.1%), Terminal-Bench 2.0 (82%), GPQA Diamond (94.5%), HLE (64.7%), and GraphWalks long-context (80%). The Batch notes parallels to OpenAI's GPT-2 limited-release strategy and characterizes the announcement as having elements of a publicity stunt alongside genuine safety concerns.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro GraphWalks Linux Foundation +18 more

9Anthropic News·19d ago·source ↗

Anthropic Releases Claude Opus 4.5 with State-of-the-Art Coding, Agent, and Computer Use Capabilities

Anthropic has released Claude Opus 4.5, positioning it as the best model in the world for coding, agentic workflows, and computer use, with pricing reduced to $5/$25 per million input/output tokens. The model demonstrates significant token efficiency gains—up to 65% fewer tokens than prior models on equivalent tasks—alongside improvements in long-horizon autonomous task execution, multi-step reasoning, and self-improving agent behavior. The release is accompanied by updates to Claude Code, the Claude Developer Platform, and integrations with Excel, Chrome, and desktop environments. Early partner feedback from GitHub Copilot, Cursor, Notion, Warp, and others reports measurable benchmark improvements and new use cases previously out of reach.

Frontier Model Releases Evaluation and Benchmarking Notion Claude Opus 4.6 Lovable +12 more

7The Batch·19d ago·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

Frontier Model Releases Evaluation and Benchmarking Unitree Harvey Claude Mythos +16 more

8Anthropic News·1mo ago·source ↗

Anthropic Releases Claude Opus 4.7 with Enhanced Coding, Vision, and Cyber Safeguards

Anthropic has released Claude Opus 4.7, a general-availability model positioned as a meaningful improvement over Opus 4.6 in advanced software engineering, long-horizon agentic tasks, and vision capabilities including higher image resolution. The model is notably the first to receive new cybersecurity safeguards developed in response to Project Glasswing, with automatic detection and blocking of prohibited cyber uses and a new Cyber Verification Program for legitimate security professionals. Opus 4.7 is available across Claude products, API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same pricing as Opus 4.6 ($5/$25 per million input/output tokens). The release is explicitly positioned below Claude Mythos Preview in overall capability, serving as a testbed for safety mechanisms before broader deployment of Mythos-class models.

Frontier Model Releases Evaluation and Benchmarking Harvey Solve Intelligence Amazon Bedrock +16 more

9Anthropic News·17d ago·source ↗

Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.

Frontier Model Releases Evaluation and Benchmarking OpenAI o1-preview Amazon Bedrock Claude 3.5 Sonnet +15 more