4Latent Space (swyx)·30h ago

Latent Space: Skill engineering and the case against one-shot AI design

Paul Bakaus discusses 'skill engineering' as a design philosophy for AI-assisted workflows, arguing against fully automated one-shot AI pipelines in favor of keeping humans in the loop. The conversation centers on Impeccable, a tool or approach Bakaus is developing, and the concept of 'loopmaxxing' — iterative human-agent collaboration cycles. The piece addresses why current agents still require human steering to produce high-quality outputs.

Enterprise Deployment Patterns Agent and Tool Ecosystem Impeccable Paul Bakaus Latent Space

Related guides (3)

Latent Space

Latent Space: The AI Practitioner's Pulse

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4Latent Space·21d ago·source ↗

AINews: Loopcraft — the art of stacking loops in AI systems

Latent Space's AI News digest highlights a concept called 'Loopcraft' — the art of stacking loops in AI agent or system design — attributed to Peter Steinberger, Boris Cherny, and Andrej Karpathy. The piece appears to be a quiet-day editorial spotlight on a conceptual framework rather than a major release or paper. The framing suggests this is a design pattern or mental model relevant to agentic AI architectures.

Agent and Tool Ecosystem Boris Cherny Peter Steinberger Andrej Karpathy +1 more

7arXiv · cs.AI·1mo ago·source ↗

SkillOpt: Systematic Text-Space Optimizer for Self-Evolving Agent Skills

SkillOpt introduces a principled optimization framework for agent skills, treating the skill document as an external trainable state analogous to model weights. A separate optimizer model converts scored rollouts into bounded edits (add/delete/replace) on a skill document, accepting only edits that improve held-out validation scores. Evaluated across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt achieves best or tied performance on all 52 evaluated cells, lifting GPT-5.5 no-skill accuracy by up to +24.8 points inside the Codex agentic loop. Optimized skill artifacts also transfer across model scales and execution environments without further optimization.

Evaluation and Benchmarking Agent and Tool Ecosystem TextGrad SkillOpt Trace2Skill +6 more

5Latent Space·42h ago·source ↗

Introspection co-founder explains autoresearch and self-improving agent loops

Roland Gavrilescu, co-founder of Introspection, discusses the concept of 'autoresearch' — a feedback loop enabling AI agents to iteratively improve themselves — in a Latent Space interview. The conversation covers agent 'recipes,' self-improving loops, and the continued role of humans in what Gavrilescu frames as a software factory paradigm. The piece offers a practitioner-level view of how agentic research pipelines are being designed and operationalized.

Agent and Tool Ecosystem Roland Gavrilescu Introspection Latent Space

4Latent Space·3d ago·source ↗

Latent Space highlights 'Loopcraft' concept from Steinberger, Cherny, and Karpathy

Latent Space's AINews digest spotlights a conceptual framework called 'Loopcraft' — described as the art of stacking loops — attributed to Peter Steinberger, Boris Cherny, and Andrej Karpathy. The piece appears to be a commentary or synthesis of ideas from these practitioners about agentic loop architectures or iterative AI workflows. The body is sparse, so the full technical substance is unclear from the excerpt alone.

Agent and Tool Ecosystem Boris Cherny Peter Steinberger Andrej Karpathy +1 more

5The Batch·7d ago·source ↗

Andrew Ng outlines three-loop framework for agentic software development

Andrew Ng describes a 'loop engineering' framework for building software with AI coding agents, comprising an agentic coding loop (agent writes/tests/iterates autonomously), a developer feedback loop (human steers at higher product level), and an external feedback loop (user testing, A/B). The piece contextualizes the buzzphrase popularized by Claude Code creator Boris Cherny and OpenClaw creator Peter Steinberger. Ng argues humans retain a 'context advantage' over AI systems that justifies continued human-in-the-loop involvement in product decisions.

Enterprise Deployment Patterns Agent and Tool Ecosystem DeepLearning.AI Boris Cherny Claude Code +2 more

6arXiv · cs.AI·1mo ago·source ↗

Systematic Study of Model-Generated Agent Skills Across the Full Skill Lifecycle

This paper presents a utility-grounded evaluation framework for model-generated agent skills, covering the full lifecycle of experience generation, skill extraction, and skill consumption across five agentic task domains. The authors find that while such skills are beneficial on average, they exhibit non-trivial negative transfer, and that skill utility is independent of model scale or baseline task strength. A key finding is that strong extractors are not necessarily strong consumers and vice versa. The work culminates in a 'meta-skill' that guides extraction toward utility-correlated features, consistently improving skill quality and reducing negative transfer.

Evaluation and Benchmarking Agent and Tool Ecosystem Model-Generated Agent Skills (paper)skill extraction meta-skill +2 more

4One Useful Thing·1mo ago·source ↗

Real AI Agents and Real Work

A commentary piece from One Useful Thing examining the practical deployment of AI agents in real work contexts, framing the tension between human-centered work and AI-generated productivity outputs. The piece appears to analyze how autonomous AI agents are changing knowledge work workflows. Published by a Tier 2 source known for applied AI analysis aimed at practitioners and researchers.

Enterprise Deployment Patterns Agent and Tool Ecosystem One Useful Thing

5One Useful Thing·1mo ago·source ↗

The Shape of AI: Jaggedness, Bottlenecks and Salients

A commentary piece from One Useful Thing analyzing the uneven capability profile of current AI systems, framing it through concepts of 'jaggedness' (uneven strengths and weaknesses), 'bottlenecks' (capability constraints), and 'salients' (areas of unexpected advance). The piece uses these concepts to explain why certain AI developments have outsized practical impact. The author references 'Nano Banana Pro' as an illustrative example of a significant capability or product development.

Evaluation and Benchmarking Enterprise Deployment Patterns One Useful Thing Nano Banana Pro