Entity · paper

Code as Agent Harness

paperactivecode-as-agent-harness-31572567·2 events·first seen May 19, 2026

Aliases: Code as Agent Harness

Co-occurring entities

multi-agent systems Executable Operational Cognition HarnessMutation embodied agents large language models multi-agent coordination GUI/OS automation execution-based verification

More like this (12)

Transformers Code Agent CodeAgents code-as-action agents Recursive Agent Harnesses OpenHarness agent harness coding agents Agent-S FinHarness Code with Claude Meta Harness AI Agents

Recent events (2)

5arXiv · cs.AI·May 27, 2026·source ↗

Governed Evolution of Agent Runtimes through Executable Operational Cognition

This paper proposes a framework for governed runtime evolution in multi-agent systems, formalizing agent-generated code artifacts as persistent runtime capabilities rather than transient outputs. It introduces HarnessMutation, a lifecycle-aware mechanism for runtime adaptation operating under explicit validation, traceability, evaluation, and rollback constraints. The framework models agent self-modification as a bounded, observable, and auditable process over persistent operational memory, building on prior 'Code as Agent Harness' work.

AI Safety Research Agent and Tool Ecosystem Executable Operational Cognition Code as Agent Harness multi-agent systems +1 more

6arXiv · cs.CL·May 19, 2026·source ↗

Code as Agent Harness: A Survey of Code as Operational Substrate for Agentic AI Systems

This survey paper introduces the concept of 'code as agent harness,' framing code not merely as output but as the operational infrastructure for LLM-based agents—covering reasoning, action, environment modeling, and execution-based verification. The authors organize the analysis across three layers: harness interface, harness mechanisms (planning, memory, tool use, feedback control), and scaling to multi-agent systems. Applications span coding assistants, GUI/OS automation, embodied agents, scientific discovery, and enterprise workflows. Open challenges include evaluation beyond task success, verification under incomplete feedback, and human oversight for safety-critical actions.

Evaluation and Benchmarking AI Safety Research embodied agents large language models Code as Agent Harness +6 more