Almanac
← Events
4The Batch (DeepLearning.AI)·19d ago

Coding Agents Accelerate Some Software Tasks More Than Others

Andrew Ng offers a practitioner framework ranking how much coding agents accelerate different software work: frontend development benefits most (agents close the loop via browser feedback), followed by backend, infrastructure, and research in decreasing order. Backend work still requires skilled developers to handle corner cases and security; infrastructure decisions remain largely human-driven due to complex tradeoffs and limited LLM knowledge in that domain; research is least accelerated because ideation and hypothesis iteration are not primarily coding tasks. The commentary is aimed at helping engineering managers set realistic expectations and organize teams accordingly.

Related guides (2)

Related events (8)

4The Batch·19d ago·source ↗

AI-Native Software Development Needs Generalists

Andrew Ng argues that agentic coding tools are reshaping software team structures by accelerating code production so dramatically that product management, design, marketing, and legal review become the new bottlenecks. He contends that the fastest-moving teams are small (2–10 people), co-located, and composed of generalists who can span engineering, product, and other functions. The piece frames this as a structural shift away from large specialist teams toward individuals who combine deep skills with cross-functional breadth.

6The Batch·19d ago·source ↗

GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain

Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.

6arXiv · cs.AI·11d ago·source ↗

Frontier coding agents use metaprogramming to handle esoteric programming languages

A new arXiv paper evaluates six LLM-based coding agents on four esoteric programming languages (including Brainfuck and Befunge-98), finding that the strongest agents—Claude Opus 4.6 and GPT-5.4 xhigh—often avoid writing the target language directly, instead generating it via Python metaprograms. Forbidding this strategy causes large performance drops, and text guidance alone does not transfer the capability to weaker models, though sharing Opus-derived Python helper code does sharply improve mid-tier agents. The study reveals capability stratification that mainstream benchmarks like SWE-Bench Verified compress into narrow bands, suggesting frontier agents succeed by constructing and debugging working models of unfamiliar environments rather than pattern-matching to training data.

4One Useful Thing·1mo ago·source ↗

Claude Code and What Comes Next

A commentary piece from One Useful Thing examining Claude Code and its implications for AI-assisted software development. The author reflects on what agentic coding tools can accomplish with the right scaffolding and considers near-term trajectories. Published in early January 2026, this represents a tier-2 analyst perspective on Anthropic's coding agent product.

4Latent Space·1mo ago·source ↗

AINews: Agents for Everything Else — Codex for Knowledge Work, Claude for Creative Work

A Latent Space daily AI news digest reflecting on the expanding scope of coding agents beyond software development into knowledge work and creative work domains. The piece uses OpenAI Codex and Anthropic Claude as anchoring examples of agents 'breaking containment' from their original coding/assistant niches. Published as a quieter news day commentary, it surveys the broadening agent ecosystem landscape.

6Latent Space·18d ago·source ↗

GitHub's plan for agentic coding — Kyle Daigle interview on Latent Space

Latent Space interviews Kyle Daigle of GitHub about the company's strategy for agentic coding workflows and the platform pressures created by the explosion in AI-assisted development following Copilot. The discussion covers how GitHub is adapting its infrastructure and product direction to support agents operating at scale. This is a strategic signal from one of the most central platforms in the developer AI ecosystem.

7arXiv · cs.AI·1mo ago·source ↗

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

This paper introduces agent just-in-time (JIT) compilation as an alternative to the sequential fetch-screenshot-execute loop used by current computer-use agents. The approach compiles natural language task descriptions directly into executable code that can include LLM calls, tool calls, and parallelization, using three components: JIT-Planner, JIT-Scheduler, and an invariant-enforcing tool protocol. Across five web applications, JIT-Planner achieves 10.4× speedup and +28% accuracy over Browser-Use, while JIT-Scheduler achieves 2.4× speedup and +9% accuracy over OpenAI CUA.

6The Batch·28d ago·source ↗

Agent Benchmarks Skew Toward Software Engineering, Missing Most Economically Valuable Labor

Researchers from Carnegie Mellon University and Stanford University mapped over 10,000 examples from 43 agent benchmarks to U.S. labor statistics using O*NET occupational taxonomies, finding that current benchmarks heavily over-represent software engineering relative to its share of employment and wages. Office and administrative support (18.2M workers, $869.8B wages) and management (11M workers, $1326.3B wages) are vastly under-represented compared to computer and mathematical occupations (5.2M workers, $563.6B wages). No single benchmark covered more than 50% of work activities, and all 43 benchmarks combined covered only 56.5% of work activities. The study identifies a systematic gap between where agentic AI is being evaluated and where the largest economic opportunity lies.