Entity · benchmark

CapTraceBench

benchmarkactivecaptracebench-b643dab7·1 events·first seen Jun 10, 2026

Aliases: CapTraceBench

Co-occurring entities

More like this (12)

MemTraceBench TriggerBench CursorBench CoTrace SorryBench RepoBench ProgramBench SimpleTrace KernelBench TriViewBench TraceLab MemBench

Recent events (1)

5arXiv · cs.CL·Jun 10, 2026·source ↗

RedAct framework protects procedural skills in agent execution traces via selective redaction and watermarking

Researchers introduce RedAct, a framework for releasing agent execution traces without exposing proprietary procedural skills (tool invocations, decision logic, error-recovery strategies). The system localizes sensitive information, rewrites traces while preserving audit-critical evidence, and embeds behavioral watermarks for provenance tracking. To evaluate the approach, the authors construct CapTraceBench, a benchmark of 75 long-horizon tasks and 154 skills across seven domains. RedAct reduces normalized skill transfer from 44.7–67.1% on raw traces to below the no-skill baseline, while watermark detection achieves 93.6–100% true positive rate with under 2% false alarms.

Evaluation and Benchmarking AI Safety Research RedAct CapTraceBench Xu Shuwen +1 more