
Claude Opus 4.8
claude-opus-4-8-931f9891·5 events·first seen 12d agoAliases: Anthropic Opus 4.8
Co-occurring entities
More like this (12)
Recent events (5)
Red-team study finds Anthropic Fable 5 and Opus 4.8 remain reliably breakable under automated jailbreak attacks
A preprint evaluates adversarial robustness of two Anthropic frontier models—Fable 5 and Opus 4.8—against four families of automated jailbreak attacks across 7,826 harmful intents. Using the HackAgent framework, the study generated hundreds of thousands of adversarial attempts and confirmed 1,620 harmful completions from Opus 4.8 and 702 from Fable 5 via a three-judge panel. Tree-of-attacks adaptive search achieved 11.5% intent-level success against Opus 4.8 and 6.1% against Fable 5, with static obfuscation nearly fully neutralized. The authors conclude that even the most hardened frontier models remain reliably breakable under sustained automated pressure, cautioning against reading aggregate resistance rates as reassurance.
Anthropic launches Claude Tag: persistent, multiplayer Claude agent for Slack teams
Anthropic has released Claude Tag in beta for Enterprise and Team customers, a Slack-native agentic product that allows teams to tag @Claude in channels where it operates as a shared, persistent team member with memory, tool access, and asynchronous task execution. Claude Tag builds on Claude Code and Cowork, adding multiplayer context (one Claude per channel visible to all), ambient proactive updates, and the ability to schedule and pursue tasks autonomously over hours or days. Anthropic reports that 65% of its own product team's code is now generated by an internal version of Claude Tag. The product runs on Claude Opus 4.8 and allows administrators to scope tool access, data permissions, and token spend per channel.
Z.ai releases GLM-5.2, a 753B MoE open-weights model claiming top open-model ranking on agentic coding benchmarks
Z.ai released GLM-5.2, a 753-billion-parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, with a 1-million-token input context and MIT license. The model ranks first among open-weights models on Artificial Analysis's Intelligence Index v4.1 (score 51, behind Claude Opus 4.8 at 56 and GPT-5.5 at 55) and leads all models on PostTrainBench, a benchmark for agentic fine-tuning tasks. Key technical contributions include a modified sparse attention indexer applied every four layers (cutting per-token computation 2.9x at 1M context), a switch from GRPO to PPO for long-horizon RL training, and a reward-hacking mitigation pipeline using rule-based filters and a judge model. API pricing is substantially below comparable proprietary models, and the release coincides with U.S. government restrictions on access to Anthropic's frontier models.
The Batch: Jalapeño inference chip, Fugu multi-agent system, Claude Tag, Robin bio-agent, and Getty-OpenAI deal
OpenAI and Broadcom announced Jalapeño, OpenAI's first custom inference chip, designed in nine months with AI-assisted design and showing better performance-per-watt than current accelerators; engineering samples are already running GPT-5.3-Codex-Spark with datacenter deployment planned by end of 2026. Sakana AI released Fugu, a multi-agent routing system that scored 73.7% on SWE-Bench Pro, outperforming Claude Opus 4.8 and GPT-5.5 while remaining below the inaccessible Fable 5. Additional items cover Anthropic's Claude Tag Slack integration for async team collaboration, Seedance 2.5 video model improvements, the Robin autonomous biology research agent that identified a novel drug candidate, and a Getty Images licensing partnership with OpenAI.
Apple Foundation Models 3 (AFM 3) bring on-device AI to iPhones and Macs via Google Gemini distillation
Apple announced its third-generation Foundation Models (AFM 3), a family of models distilled from Google Gemini and designed to run on-device on Apple silicon, including iPhones and Macs. The flagship on-device model, AFM 3 Core Advanced, uses a novel 'Instruction-Following Pruning' technique as an alternative to standard mixture-of-experts routing, enabling faster inference and flash-memory storage with 20B total parameters but only 1-4B active. The family also includes cloud-hosted variants (AFM 3 Cloud, Cloud Image, Cloud Pro), and Apple's Foundation Models Framework will allow developers to swap in third-party models like Claude or Gemini. No public benchmark results have been released yet; Apple says they will follow later in 2026.