3GitHub Trending (AI/LLM filtered)·22d ago

Awesome Harness Engineering: Curated List for AI Agent Infrastructure

A GitHub repository aggregating resources on AI agent harness engineering, covering tools, patterns, evaluations, memory systems, MCP (Model Context Protocol), permissions, observability, and orchestration. The list has accumulated 1,318 stars with 39 added today, indicating moderate community traction. It serves as a reference index rather than original research or tooling.

Evaluation and Benchmarking Agent and Tool Ecosystem ai-boost/awesome-harness-engineering Model Context Protocol

Related guides (3)

Model Context ProtocolConcept

Model Context Protocol (MCP): The Universal Plug for AI Agents

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6arXiv · cs.CL·1mo ago·source ↗

Code as Agent Harness: A Survey of Code as Operational Substrate for Agentic AI Systems

This survey paper introduces the concept of 'code as agent harness,' framing code not merely as output but as the operational infrastructure for LLM-based agents—covering reasoning, action, environment modeling, and execution-based verification. The authors organize the analysis across three layers: harness interface, harness mechanisms (planning, memory, tool use, feedback control), and scaling to multi-agent systems. Applications span coding assistants, GUI/OS automation, embodied agents, scientific discovery, and enterprise workflows. Open challenges include evaluation beyond task success, verification under incomplete feedback, and human oversight for safety-critical actions.

Evaluation and Benchmarking AI Safety Research embodied agents large language models Code as Agent Harness +6 more

6arXiv · cs.LG·25d ago·source ↗

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

This paper argues that the next major bottleneck in agentic AI is system-level design—what the authors call 'scaling the harness'—rather than continued model scaling alone. The agent harness encompasses memory substrates, context constructors, skill-routing layers, orchestration loops, and verification/governance components that together translate model capability into long-horizon behavior. The authors identify three core bottlenecks (context governance, trustworthy memory, dynamic skill routing) and propose harness-level benchmarks measuring trajectory quality, memory hygiene, and verification cost. They introduce CheetahClaws, a Python-native reference harness, and compare it against Claude Code and OpenClaw.

Evaluation and Benchmarking Inference Economics SafeRL-Lab dynamic skill routing Scaling the Harness (paper)+8 more

3Github Trending·14d ago·source ↗

danielmiessler/Personal_AI_Infrastructure: agentic AI infrastructure framework in TypeScript

Daniel Miessler's Personal_AI_Infrastructure is a TypeScript project on GitHub framed as agentic AI infrastructure for augmenting human capabilities, currently trending with ~14,925 stars and 63 new stars today. The repository appears to be a personal AI agent harness or orchestration layer. Limited detail is available from the trending listing alone, but the star count indicates meaningful community traction.

Agent and Tool Ecosystem Daniel Miessler Atlas

4Github Trending·25d ago·source ↗

shareAI-lab/learn-claude-code: Minimal Claude Code-style Agent Harness in Python

A GitHub repository implementing a minimal 'nano' version of a Claude Code-style agent harness built from scratch in Python, using Bash as the primary tool interface. The project has accumulated 62,802 stars with 262 added today, indicating significant community interest. It serves as an educational resource for understanding how agentic coding assistants like Claude Code are structured at a low level.

Agent and Tool Ecosystem shareAI-lab Claude Code shareAI-lab/learn-claude-code +1 more

5Openai Blog·1mo ago·source ↗

Harness Engineering: Leveraging Codex in an Agent-First World

OpenAI published a technical post by Ryan Lopopolo describing how Codex is being used in an agent-first engineering workflow. The piece appears to cover practical patterns for integrating Codex into software development pipelines where AI agents take a more central role. As a Tier 1 source announcement, it likely details real-world engineering practices and lessons from deploying Codex at scale.

Enterprise Deployment Patterns Agent and Tool Ecosystem Ryan Lopopolo OpenAI Codex

5Github Trending·15d ago·source ↗

Microsoft agent-framework: open-source library for building and orchestrating AI agents

Microsoft has published an open-source framework on GitHub for building, orchestrating, and deploying AI agents and multi-agent workflows, with support for both Python and .NET. The repository has accumulated 11,061 stars. It represents Microsoft's entry into the agent harness tooling space alongside existing frameworks like LangChain and AutoGen.

Agent and Tool Ecosystem Microsoft microsoft/agent-framework

5arXiv · cs.CL·9d ago·source ↗

Claw-SWE-Bench: A benchmark for evaluating agent harnesses on multilingual coding tasks

Researchers introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol designed to fairly compare heterogeneous agent harnesses ("claws") on GitHub issue-resolution tasks. The benchmark contains 350 instances across 8 languages and 43 repositories, with an 80-instance Lite subset for cost-efficient validation. Key findings show adapter design dominates raw model choice: a minimal adapter scores 19.1% Pass@1 versus 73.4% for a full adapter using the same GLM 5.1 backbone, and harness choice and model choice each shift Pass@1 by roughly 27-29 percentage points. The work also introduces cost accounting as a first-class evaluation axis alongside accuracy.

Evaluation and Benchmarking Inference Economics SWE-Bench Multilingual OpenClaw SWE-Bench Verified +4 more

5Github Trending·12d ago·source ↗

LangChain releases DeepAgents: a batteries-included agent harness

LangChain has published DeepAgents, a Python agent harness described as 'batteries-included' for building AI agents. The repository has accumulated 24,206 GitHub stars with 105 added today, indicating significant community traction. The project extends LangChain's position in the agent tooling ecosystem.

Agent and Tool Ecosystem DeepAgents LangChain