The Batch Issue 359: Loop Engineering for Agentic Coding, GLM-5.2 Open-Weights Release, Apple On-Device Models
Andrew Ng's weekly letter introduces a framework of three nested loops for agentic software development (engineering loop, developer feedback loop, external feedback loop), contextualizing the 'loop engineering' trend popularized by Claude Code and OpenClaw creators. The issue also covers Z.ai's GLM-5.2, a 753B MoE open-weights model with 1M token context that claims first place among open models on Artificial Analysis Intelligence Index v4.1 and leads all models on PostTrainBench for long-running agentic tasks. Additional coverage includes Apple's recipe for on-device models and AI education trends.
Related guides (4)
Related events (8)
GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain
Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.
Andrew Ng outlines three-loop framework for agentic software development
Andrew Ng describes a 'loop engineering' framework for building software with AI coding agents, comprising an agentic coding loop (agent writes/tests/iterates autonomously), a developer feedback loop (human steers at higher product level), and an external feedback loop (user testing, A/B). The piece contextualizes the buzzphrase popularized by Claude Code creator Boris Cherny and OpenClaw creator Peter Steinberger. Ng argues humans retain a 'context advantage' over AI systems that justifies continued human-in-the-loop involvement in product decisions.
Z.ai releases GLM-5.2, a 753B MoE open-weights model claiming top open-model ranking on agentic coding benchmarks
Z.ai released GLM-5.2, a 753-billion-parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, with a 1-million-token input context and MIT license. The model ranks first among open-weights models on Artificial Analysis's Intelligence Index v4.1 (score 51, behind Claude Opus 4.8 at 56 and GPT-5.5 at 55) and leads all models on PostTrainBench, a benchmark for agentic fine-tuning tasks. Key technical contributions include a modified sparse attention indexer applied every four layers (cutting per-token computation 2.9x at 1M context), a switch from GRPO to PPO for long-horizon RL training, and a reward-hacking mitigation pipeline using rule-based filters and a judge model. API pricing is substantially below comparable proprietary models, and the release coincides with U.S. government restrictions on access to Anthropic's frontier models.
Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation
Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.
The Batch Issue 356: Qwen3.7-Max release, White House AI executive order, fine-tuning breaks copyright alignment
The Batch issue 356 covers several distinct AI developments: Alibaba's release of Qwen3.7-Max, a closed-weights flagship LLM targeting agentic coding and scientific tasks with a novel RL training approach that decouples task, harness, and verifier; a new White House executive order on frontier AI models focused on cybersecurity, including voluntary model-sharing with government; and a finding that fine-tuning breaks copyright alignment in LLMs. Andrew Ng's editorial commentary frames the executive order as a reasonable compromise, noting Anthropic's Mythos vulnerability-detection model as a key driver of the cybersecurity concerns behind the regulation.
Data Points: GLM-5.2 leads open models on coding benchmarks; SpaceX acquires Cursor; OpenRouter Fusion; Anthropic coding study; ChatGPT market share drops
Zhipu released GLM-5.2, a 744B-parameter open model under MIT license that ranks second only to Claude Opus 4.8 on long-horizon coding benchmarks including FrontierSWE and SWE-Marathon, featuring a 1M-token context window and a 2.9× compute reduction via IndexShare attention. SpaceX is acquiring Cursor (Anysphere) for $60B in stock, positioning Musk's company to compete in AI software tools using xAI's Colossus infrastructure. OpenRouter launched Fusion, a multi-model synthesis tool showing that budget model panels can match frontier model performance at half the cost. An Anthropic study of 400K Claude Code sessions found domain expertise—not coding skill—is the primary driver of agentic output, while a Munich court ruled Google liable for false claims in AI Overviews.
The Batch Issue 346: Nvidia Nemotron Super 120B, OpenAI-Amazon Deal, Regulatory Commentary
The Batch's weekly digest covers Nvidia's release of Nemotron 3 Super 120B-A12B, an open-weights hybrid mamba-2/transformer/MoE model with 1M token context trained on 25 trillion tokens, positioned as a speed leader in its size class for agentic applications. The issue also touches on OpenAI's Amazon deal and Grok video pricing cuts. Editor Andrew Ng's letter addresses the White House's proposed federal AI preemption framework and critiques what he characterizes as coordinated anti-AI messaging campaigns. Multiple significant industry developments are bundled in a single newsletter digest.
The Batch digest: U.S. chatbot adoption tops 50%, AA-Briefcase benchmark, ARD spec, North Mini Code, Fable/Mythos export controls
A weekly digest from DeepLearning.AI covers five AI developments: a Pew Research Center survey showing nearly half of U.S. adults now use AI chatbots (ChatGPT at 44% adoption); Artificial Analysis releasing AA-Briefcase, a new benchmark for complex knowledge-work tasks where Claude Opus 4.8 is a top performer; Hugging Face publishing a reference implementation of the Agentic Resource Discovery (ARD) open spec co-developed with Microsoft, Google, and others for runtime tool discovery by agents; Cohere releasing North Mini Code, a 30B-parameter open-weight MoE coding model under Apache 2.0; and over 100 cybersecurity professionals signing an open letter urging the U.S. government to reverse export controls on Anthropic's Claude Fable 5 and Claude Mythos 5. The ARD and export-control items are the highest-signal stories, touching agent infrastructure standards and AI regulatory policy respectively.



