5arXiv cs.CL (Computation and Language)·11d ago

RedAct framework protects procedural skills in agent execution traces via selective redaction and watermarking

Researchers introduce RedAct, a framework for releasing agent execution traces without exposing proprietary procedural skills (tool invocations, decision logic, error-recovery strategies). The system localizes sensitive information, rewrites traces while preserving audit-critical evidence, and embeds behavioral watermarks for provenance tracking. To evaluate the approach, the authors construct CapTraceBench, a benchmark of 75 long-horizon tasks and 154 skills across seven domains. RedAct reduces normalized skill transfer from 44.7–67.1% on raw traces to below the no-skill baseline, while watermark detection achieves 93.6–100% true positive rate with under 2% false alarms.

Evaluation and Benchmarking AI Safety Research Agent and Tool Ecosystem RedAct CapTraceBench Xu Shuwen

Related guides (3)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.AI·1mo ago·source ↗

Reversa: A Multi-Agent Framework for Reverse Engineering Legacy Software into AI-Readable Operational Specifications

Reversa is a multi-agent pipeline framework that converts legacy software systems into traceable operational specifications suitable for use by AI coding agents. The framework employs specialized agents for surface mapping, module analysis, implicit rule extraction, architecture synthesis, and specification review, with mechanisms for traceability, confidence marking, and gap preservation. An exploratory case study on migrating an ATM system from COBOL to Go produced 517 confidence-indexed claims, 53 Gherkin parity scenarios, and a partial reconstruction plan, though final validation was not completed. The system is distributed as a Node.js CLI and is positioned relative to literature on reverse engineering, LLM-based documentation, and software agents.

Enterprise Deployment Patterns Agent and Tool Ecosystem SHA-256 Go (programming language)Gherkin +3 more

6arXiv · cs.CL·25d ago·source ↗

ProAct: Proactive Agent Architecture Using Idle-Time Compute to Anticipate User Needs

ProAct is a proactive agent architecture that uses idle time between user interactions to predict upcoming needs, pre-fetch information, and resolve knowledge gaps before queries are issued. The system analyzes dialogue history and persistent memory to iteratively acquire relevant information in advance. Evaluated on the new ProActEval benchmark (200 scenarios, 40 domains), ProAct reduces required turns by 14.8%, user effort by 11.7%, and hallucination rates by 28.1% compared to reactive baselines. The work also achieves state-of-the-art reflective accuracy on MemBench.

Evaluation and Benchmarking Inference Economics ProActEval idle-time compute ProAct +3 more

5arXiv · cs.CL·15d ago·source ↗

DataCOPE: Unsupervised skill discovery framework for data-analytic agents

Researchers introduce DataCOPE, an unsupervised verifier-guided framework for discovering reusable procedural skills in data-analytic agents without labeled supervision or parameter updates. The system coordinates three components—a data-analytic agent, an unsupervised verifier, and a skill manager for contrastive skill distillation—with task-specific verifier instantiations for report-style and reasoning-style analysis. Evaluated on Deep Data Research and DABStep benchmarks, DataCOPE improves mean scores by 9.71% and 32.30% respectively across four model settings. The approach addresses a key bottleneck in agentic data analysis: acquiring reliable skill supervision at scale.

Evaluation and Benchmarking Agent and Tool Ecosystem DABStep Deep Research DataCOPE

6Openai Blog·1mo ago·source ↗

Deep Research System Card

OpenAI has published the system card for its Deep Research capability, detailing pre-release safety work including external red teaming and frontier risk evaluations conducted under the Preparedness Framework. The document outlines identified risk areas and the mitigations implemented before deployment. This is the formal safety disclosure accompanying the Deep Research product launch.

Frontier Model Releases AI Safety Research Deep Research Preparedness Framework OpenAI +1 more

5arXiv · cs.CL·5d ago·source ↗

RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.

Agent and Tool Ecosystem Alignment and RLHF ALFWorld Sokoban RePro +2 more

4Github Trending·1mo ago·source ↗

agent-skills: Secure Validated Skill Registry for AI Coding Agents

A TypeScript-based open-source skill registry designed to extend AI coding agents including Claude Code, Cursor, GitHub Copilot, and Antigravity with validated, reusable capabilities. The project provides a structured way to add skills to multiple coding agent platforms with a focus on security and validation. It is gaining notable traction with 3,767 total stars and 225 stars added today.

Enterprise Deployment Patterns Agent and Tool Ecosystem Cursor Claude Code Antigravity +2 more

4Hugging Face Blog·1mo ago·source ↗

Red-Teaming Large Language Models

This Hugging Face blog post introduces red-teaming as a safety evaluation methodology for large language models, explaining how adversarial testing can surface harmful outputs, biases, and failure modes before deployment. It covers techniques for systematically probing LLMs to elicit problematic behaviors and discusses the role of red-teaming in responsible AI development. The post serves as an educational overview aimed at practitioners working on LLM safety.

Evaluation and Benchmarking AI Safety Research large language models Hugging Face red-teaming

6arXiv · cs.AI·25d ago·source ↗

VeriTrace: Cognitive-Graph Framework with Explicit Regulatory Loops for Deep Research Agents

VeriTrace introduces a cognitive-graph framework for deep research agents that replaces implicit LLM reasoning over intermediate representations with three explicit regulatory loops: interpretive update, deviation feedback, and schema revision. The system addresses contamination and error propagation in evolving mental models during complex multi-step research tasks. Using Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench Insight and 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DeepResearch Bench.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 cognitive-graph DeepResearch Bench +4 more