7The Batch (DeepLearning.AI)·4d ago

Data Points: GLM-5.2 leads open models on coding benchmarks; SpaceX acquires Cursor; OpenRouter Fusion; Anthropic coding study; ChatGPT market share drops

Zhipu released GLM-5.2, a 744B-parameter open model under MIT license that ranks second only to Claude Opus 4.8 on long-horizon coding benchmarks including FrontierSWE and SWE-Marathon, featuring a 1M-token context window and a 2.9× compute reduction via IndexShare attention. SpaceX is acquiring Cursor (Anysphere) for $60B in stock, positioning Musk's company to compete in AI software tools using xAI's Colossus infrastructure. OpenRouter launched Fusion, a multi-model synthesis tool showing that budget model panels can match frontier model performance at half the cost. An Anthropic study of 400K Claude Code sessions found domain expertise—not coding skill—is the primary driver of agentic output, while a Munich court ruled Google liable for false claims in AI Overviews.

Related guides (5)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner

Claude Code

Claude Code: Anthropic's Autonomous Coding Agent

Read asBeginnerfeatured

Claude Opus 4.6

Claude Opus 4.6: Anthropic's Milestone Model for Long-Context and Agentic Work

Read asBeginner

GPT-5.5

GPT-5.5: OpenAI's Benchmark-Leading Agentic Model with a Hallucination Problem

Read asIn-depth

DeepSeek V4

DeepSeek V4: The Open-Weights Giant Reshaping AI Economics

Read asBeginner In-depth

Related events (8)

6Latent Space·4d ago·source ↗

GLM-5.2 claims top frontend coding performance; IndexShare speculative decoding introduced

A Latent Space AI news digest highlights GLM-5.2 as a new open-weights model claiming top performance on frontend coding tasks. The digest also covers IndexShare, a technique for speculative decoding. The body is truncated but the headline signals a notable open-weights model release and an inference optimization development.

Evaluation and Benchmarking Open Weights Progress IndexShare GLM-5.1 Latent Space +1 more

7The Batch·11d ago·source ↗

The Batch: Claude Mythos 5 / Fable 5 debut, Apple AFM 3, Google Live Translate, OpenAI IPO filing, FrontierCode benchmark

Anthropic launched Claude Fable 5 (a safety-guardrailed model) and Claude Mythos 5 (same underlying model with safeguards removed, for vetted cyberdefense/infrastructure users via Project Glasswing with US government collaboration), both priced at $10/$50 per million tokens. Apple released five new Apple Foundation Models (AFM 3) spanning on-device and cloud tiers, built with Google and Nvidia infrastructure. Additional headlines cover Google's Gemini 3.5 Live Translate (70+ languages, real-time), OpenAI's confidential SEC IPO filing, a NotebookLM upgrade to Gemini 3.5, and Cognition's FrontierCode benchmark for code-quality evaluation where Claude Opus 4.8 leads at 34.3%.

Frontier Model Releases Evaluation and Benchmarking Claude Mythos Claude Opus 4.6 Google +19 more

6The Batch·20d ago·source ↗

GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain

Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.

Frontier Model Releases Evaluation and Benchmarking DeepLearning.AI Artificial Analysis Intelligence Index SWE-bench +9 more

6The Batch·9d ago·source ↗

Cursor's Composer 2.5 rivals GPT-5.5 and Claude Opus 4.7 on coding benchmarks at lower cost

Cursor released Composer 2.5, a specialized agentic coding model built on Moonshot's Kimi K2.5 open weights with additional pretraining and reinforcement learning fine-tuning tailored to Cursor's own CLI harness. The model ranks third on the Artificial Analysis Coding Agent Index behind Claude Opus 4.7 and GPT-5.5 at max reasoning, but significantly undercuts them on cost ($0.44 vs $4.14 per task) and speed (6.7 vs 17.7 minutes). The training approach—co-optimizing model and harness together using synthetic tasks, text feedback during RL, and 25x more synthetic data than Composer 2—illustrates a specialist model strategy that challenges the dominance of generalist frontier models in coding workflows.

Frontier Model Releases Inference Economics SWE-Bench-Pro-Hard-AA Claude Opus 4.6 SpaceX +12 more

9Anthropic News·20d ago·source ↗

Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.

Long Context Evolution Frontier Model Releases GPT-5.2 Claude Opus 4.6 adaptive thinking +13 more

6The Batch·19d ago·source ↗

MiniMax M2.7 proprietary reasoning model competes with Gemini and Claude Opus; roundup covers Cursor Composer 2, MAI-Image-2, Claude Code Channels, and Anthropic defense dispute

MiniMax released M2.7, a proprietary reasoning model that achieved 66.6% on MLE Bench Lite (tying Gemini 3.1) and 56.22% on SWE-Pro, priced at $0.30/$1.20 per million tokens, with the shift to proprietary marking a potential strategic pivot among Chinese AI labs away from open weights. Cursor released Composer 2, an agentic coding model built on a fine-tuned Kimi 2.5 (via Moonshot partnership), priced 86% cheaper than its predecessor and scoring 73.7 on SWE-bench Multilingual. Anthropic released Claude Code Channels, routing Telegram and Discord messages into local Claude Code sessions via MCP plugins, and separately filed a court response denying it has any backdoor or kill switch into military deployments of Claude. Microsoft announced MAI-Image-2, a text-to-image model ranking third on Arena.ai among research labs.

Frontier Model Releases Open Weights Progress Stitch Claude Sonnet 4 SWE-Pro +17 more

7The Batch·20d ago·source ↗

Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation

Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

5Latent Space·2d ago·source ↗

GLM-5.2 passes community vibe checks; Z.ai forecasts Open Fable by December

GLM-5.2, a new open model, is reportedly passing community vibe checks and drawing comparisons to GPT-class frontier models. Z.ai has forecast the release of Open Fable by December. The item signals a potential shift in the open-weights landscape toward genuine frontier-level capability.

Frontier Model Releases Open Weights Progress Open Fable GLM-5.1 Z.ai