5The Batch (DeepLearning.AI)·3d ago

The Batch Issue 359: Loop Engineering for Agentic Coding, GLM-5.2 Open-Weights Release, Apple On-Device Models

Andrew Ng's weekly letter introduces a framework of three nested loops for agentic software development (engineering loop, developer feedback loop, external feedback loop), contextualizing the 'loop engineering' trend popularized by Claude Code and OpenClaw creators. The issue also covers Z.ai's GLM-5.2, a 753B MoE open-weights model with 1M token context that claims first place among open models on Artificial Analysis Intelligence Index v4.1 and leads all models on PostTrainBench for long-running agentic tasks. Additional coverage includes Apple's recipe for on-device models and AI education trends.

Frontier Model Releases Evaluation and Benchmarking Open Weights Progress Agent and Tool Ecosystem DeepLearning.AI Artificial Analysis Intelligence Index Boris Cherny Claude Code GLM-5.1 Apple Andrew Ng Z.ai PostTrainBench

Related guides (4)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Claude Code

Claude Code: Anthropic's Autonomous Coding Agent

Read asBeginner In-depthfeatured

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner

Related events (8)

6The Batch·28d ago·source ↗

GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain

Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.

Frontier Model Releases Evaluation and Benchmarking DeepLearning.AI Artificial Analysis Intelligence Index SWE-bench +9 more

5The Batch·3d ago·source ↗

Andrew Ng outlines three-loop framework for agentic software development

Andrew Ng describes a 'loop engineering' framework for building software with AI coding agents, comprising an agentic coding loop (agent writes/tests/iterates autonomously), a developer feedback loop (human steers at higher product level), and an external feedback loop (user testing, A/B). The piece contextualizes the buzzphrase popularized by Claude Code creator Boris Cherny and OpenClaw creator Peter Steinberger. Ng argues humans retain a 'context advantage' over AI systems that justifies continued human-in-the-loop involvement in product decisions.

Enterprise Deployment Patterns Agent and Tool Ecosystem DeepLearning.AI Boris Cherny Claude Code +2 more

7The Batch·3d ago·source ↗

Z.ai releases GLM-5.2, a 753B MoE open-weights model claiming top open-model ranking on agentic coding benchmarks

Z.ai released GLM-5.2, a 753-billion-parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, with a 1-million-token input context and MIT license. The model ranks first among open-weights models on Artificial Analysis's Intelligence Index v4.1 (score 51, behind Claude Opus 4.8 at 56 and GPT-5.5 at 55) and leads all models on PostTrainBench, a benchmark for agentic fine-tuning tasks. Key technical contributions include a modified sparse attention indexer applied every four layers (cutting per-token computation 2.9x at 1M context), a switch from GRPO to PPO for long-horizon RL training, and a reward-hacking mitigation pipeline using rule-based filters and a judge model. API pricing is substantially below comparable proprietary models, and the release coincides with U.S. government restrictions on access to Anthropic's frontier models.

Open Weights Progress Inference Economics Artificial Analysis Intelligence Index AA-Briefcase DeepSeek V4 +14 more

7The Batch·28d ago·source ↗

Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation

Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

6The Batch·24d ago·source ↗

The Batch Issue 356: Qwen3.7-Max release, White House AI executive order, fine-tuning breaks copyright alignment

The Batch issue 356 covers several distinct AI developments: Alibaba's release of Qwen3.7-Max, a closed-weights flagship LLM targeting agentic coding and scientific tasks with a novel RL training approach that decouples task, harness, and verifier; a new White House executive order on frontier AI models focused on cybersecurity, including voluntary model-sharing with government; and a finding that fine-tuning breaks copyright alignment in LLMs. Andrew Ng's editorial commentary frames the executive order as a reasonable compromise, noting Anthropic's Mythos vulnerability-detection model as a key driver of the cybersecurity concerns behind the regulation.

Frontier Model Releases AI Safety Research Qwen3.7-Plus-Preview DeepLearning.AI Artificial Analysis Intelligence Index +9 more

7The Batch·12d ago·source ↗

Data Points: GLM-5.2 leads open models on coding benchmarks; SpaceX acquires Cursor; OpenRouter Fusion; Anthropic coding study; ChatGPT market share drops

Zhipu released GLM-5.2, a 744B-parameter open model under MIT license that ranks second only to Claude Opus 4.8 on long-horizon coding benchmarks including FrontierSWE and SWE-Marathon, featuring a 1M-token context window and a 2.9× compute reduction via IndexShare attention. SpaceX is acquiring Cursor (Anysphere) for $60B in stock, positioning Musk's company to compete in AI software tools using xAI's Colossus infrastructure. OpenRouter launched Fusion, a multi-model synthesis tool showing that budget model panels can match frontier model performance at half the cost. An Anthropic study of 400K Claude Code sessions found domain expertise—not coding skill—is the primary driver of agentic output, while a Munich court ruled Google liable for false claims in AI Overviews.

Frontier Model Releases Evaluation and Benchmarking DRACO FrontierSWE Anysphere +24 more

6The Batch·27d ago·source ↗

The Batch Issue 346: Nvidia Nemotron Super 120B, OpenAI-Amazon Deal, Regulatory Commentary

The Batch's weekly digest covers Nvidia's release of Nemotron 3 Super 120B-A12B, an open-weights hybrid mamba-2/transformer/MoE model with 1M token context trained on 25 trillion tokens, positioned as a speed leader in its size class for agentic applications. The issue also touches on OpenAI's Amazon deal and Grok video pricing cuts. Editor Andrew Ng's letter addresses the White House's proposed federal AI preemption framework and critiques what he characterizes as coordinated anti-AI messaging campaigns. Multiple significant industry developments are bundled in a single newsletter digest.

Frontier Model Releases Open Weights Progress Nemotron 3 Super 120B-A12B Nemotron 3 Ultra-500B-A50B DeepLearning.AI +9 more

6The Batch·7d ago·source ↗

The Batch digest: U.S. chatbot adoption tops 50%, AA-Briefcase benchmark, ARD spec, North Mini Code, Fable/Mythos export controls

A weekly digest from DeepLearning.AI covers five AI developments: a Pew Research Center survey showing nearly half of U.S. adults now use AI chatbots (ChatGPT at 44% adoption); Artificial Analysis releasing AA-Briefcase, a new benchmark for complex knowledge-work tasks where Claude Opus 4.8 is a top performer; Hugging Face publishing a reference implementation of the Agentic Resource Discovery (ARD) open spec co-developed with Microsoft, Google, and others for runtime tool discovery by agents; Cohere releasing North Mini Code, a 30B-parameter open-weight MoE coding model under Apache 2.0; and over 100 cybersecurity professionals signing an open letter urging the U.S. government to reverse export controls on Anthropic's Claude Fable 5 and Claude Mythos 5. The ARD and export-control items are the highest-signal stories, touching agent infrastructure standards and AI regulatory policy respectively.

Evaluation and Benchmarking Open Weights Progress Artificial Analysis DeepLearning.AI Claude Mythos +22 more