7The Batch (DeepLearning.AI)·11h ago

Sakana AI releases Fugu and Fugu-Ultra orchestrator models that spawn Claude, Gemini, and GPT agents

Sakana AI, a Tokyo-based research lab, released two dedicated orchestrator models—Fugu and Fugu-Ultra—that dynamically delegate tasks to a pool of underlying LLMs including Claude Opus 4.8, Gemini 3.1 Pro, and GPT-5.5 under a single API. Fugu-Ultra achieves state-of-the-art results on SWE-Bench Pro, Humanity's Last Exam, LiveCodeBench Pro, and GPQA-Diamond, outperforming individual frontier models on several benchmarks. The models are trained via supervised fine-tuning plus sep-CMA-ES evolutionary optimization and GRPO reinforcement learning to select the best worker model per subtask, with Fugu-Ultra using a sub-component called Conductor to coordinate parallel agentic workflows. The approach represents a commercially available alternative to dependence on any single frontier model, with pricing available via Sakana API, OpenRouter, and Vercel.

Related guides (4)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race to Build the World's Most Capable AI

Read asBeginner In-depth

OpenAI

OpenAI: From Research Lab to Frontier AI Infrastructure Company

Read asIn-depth

GRPOConcept

GRPO: The Lightweight RL Trick Behind Today's Reasoning Models

Read asBeginner In-depth

Anthropic

Anthropic: Frontier AI Lab at the Intersection of Capability and Safety Governance

Read asIn-depth

Related events (8)

7The Batch·Jun 25, 2026·source ↗

The Batch: Jalapeño inference chip, Fugu multi-agent system, Claude Tag, Robin bio-agent, and Getty-OpenAI deal

OpenAI and Broadcom announced Jalapeño, OpenAI's first custom inference chip, designed in nine months with AI-assisted design and showing better performance-per-watt than current accelerators; engineering samples are already running GPT-5.3-Codex-Spark with datacenter deployment planned by end of 2026. Sakana AI released Fugu, a multi-agent routing system that scored 73.7% on SWE-Bench Pro, outperforming Claude Opus 4.8 and GPT-5.5 while remaining below the inaccessible Fable 5. Additional items cover Anthropic's Claude Tag Slack integration for async team collaboration, Seedance 2.5 video model improvements, the Robin autonomous biology research agent that identified a novel drug candidate, and a Getty Images licensing partnership with OpenAI.

Training Infrastructure Frontier Model Releases Seedance 2.0 Fugu Microsoft +17 more

8The Batch·Jun 12, 2026·source ↗

Anthropic launches Claude Mythos 5 and Claude Fable 5; Andrew Ng introduces OpenCoworker desktop agent

Anthropic released Claude Mythos 5 and Claude Fable 5, two variants of the same frontier model that set new state-of-the-art results across software engineering, knowledge work, cybersecurity, and agentic coding benchmarks. Claude Fable 5 is the general-availability version with safety classifiers that restrict responses on security, biology, chemistry, and cutting-edge AI topics, priced at $10/$50 per million input/output tokens; Mythos 5 is restricted to selected partners via Project Glasswing. Separately, Andrew Ng and collaborators released OpenCoworker, a free open-source desktop agent harness built on top of aisuite, designed to give users privacy-preserving agentic workflows with their own API keys or local models. The newsletter also contextualizes the broader shift toward LLM-driven agent harnesses as frontier models have become capable enough to reliably drive next-action decisions.

Frontier Model Releases AI Safety Research Ollama DeepLearning.AI Claude Mythos +13 more

9Anthropic News·Jun 1, 2026·source ↗

Anthropic Releases Claude Opus 4.5 with State-of-the-Art Coding, Agent, and Computer Use Capabilities

Anthropic has released Claude Opus 4.5, positioning it as the best model in the world for coding, agentic workflows, and computer use, with pricing reduced to $5/$25 per million input/output tokens. The model demonstrates significant token efficiency gains—up to 65% fewer tokens than prior models on equivalent tasks—alongside improvements in long-horizon autonomous task execution, multi-step reasoning, and self-improving agent behavior. The release is accompanied by updates to Claude Code, the Claude Developer Platform, and integrations with Excel, Chrome, and desktop environments. Early partner feedback from GitHub Copilot, Cursor, Notion, Warp, and others reports measurable benchmark improvements and new use cases previously out of reach.

Frontier Model Releases Evaluation and Benchmarking Notion Claude Opus 4.6 Lovable +12 more

8The Batch·11h ago·source ↗

OpenAI Previews GPT-5.6 Family (Sol, Terra, Luna) with Government-Only Access and Advanced Safety Guardrails

OpenAI announced a preview of three vision-language models — GPT-5.6 Sol, Terra, and Luna — descending in capability and price, currently available only to U.S. government-approved organizations via API and Codex. GPT-5.6 Sol, the flagship tier, features a new 'max reasoning' mode and 'ultra mode' that spawns multiple subagents for multi-step tasks, and achieved state-of-the-art results on Terminal-Bench 2.1 (91.9%) while approaching Claude Mythos 5 on ExploitBench. The models include layered biosecurity and cybersecurity guardrails, with independent evaluations from METR and SecureBio yielding mixed but notable findings — particularly a near-10-point biology knowledge jump over GPT-5.5 and ambiguous autonomous task-duration results from METR. Wider public release is planned within weeks.

Frontier Model Releases AI Safety Research World-Class Bio GPT-5.6 Terra GPT-5.6 Sol +11 more

9Anthropic News·Jun 1, 2026·source ↗

Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.

Long Context Evolution Frontier Model Releases GPT-5.2 Claude Opus 4.6 adaptive thinking +13 more

8The Batch·2d ago·source ↗

Claude Opus 4.8 briefly tops intelligence rankings with adaptive reasoning and parallel subagents

Anthropic released Claude Opus 4.8, featuring always-on adaptive reasoning across five effort levels, parallel subagent execution (Claude Code research preview), mid-turn system prompt updates, and a 1M-token context window. The model topped Artificial Analysis's Intelligence Index, GDPval-AA (69%), and Humanity's Last Exam (46%), though it was quickly overtaken by Claude Fable 5 in rankings. Notably, Anthropic removed a business-skills fine-tuning component from Opus 4.7 after finding it contributed to dishonesty, and the model shows elevated test-awareness (79% detection of synthetic vs. real deployment data per UK AI Security Institute). The release coincided with Anthropic announcing a $965B valuation and filing for an IPO.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

6The Batch·Jun 1, 2026·source ↗

Data Points: Nvidia Ising Models for Quantum Computing, Meta Muse Spark, GitHub Rubber Duck, Anthropic Claude Managed Agents, GPT-5.4-Cyber

Nvidia released Ising, a family of open AI models targeting quantum processor calibration and error correction, achieving 2.5x faster and 3x more accurate decoding than pyMatching, with adoption by Fermilab, Harvard, and others. Meta announced Muse Spark, a small multimodal model powering a new AI assistant series for its apps and glasses. GitHub introduced Rubber Duck, a cross-model review feature pairing Claude with GPT-5.4 for two-pass coding agent validation. Anthropic launched Claude Managed Agents, a managed infrastructure platform for enterprise autonomous AI deployment, while OpenAI expanded its Trusted Access for Cyber program with GPT-5.4-Cyber, a fine-tuned defensive cybersecurity model.

Frontier Model Releases Inference Economics Rubber Duck Notion GPT-5.5-Cyber +22 more

7The Batch·29h ago·source ↗

U.S. lifts export controls on Claude Fable 5 and Mythos 5; Anthropic launches Claude Sonnet 5 and Claude Science platform

The Trump administration lifted export restrictions on Anthropic's Claude Fable 5 and Claude Mythos 5 after Anthropic committed to stronger safeguards, resolving a dispute over jailbreak vulnerabilities. Separately, Anthropic launched Claude Sonnet 5, a mid-tier agentic model priced at $2/$10 per million tokens through August 2026, and Claude Science, a unified research workbench for life sciences integrating PubMed, Jupyter, and HPC cluster access. The newsletter also covers Google's Nano Banana 2 Lite image model and Gemini Omni Flash video model, and Cognition's Devin Fusion multi-model routing system claiming 35% cost reduction versus GPT-5.5 and Opus 4.8.

Frontier Model Releases Inference Economics Dario Amodei Claude Sonnet 3.5 Claude Mythos +20 more