Almanac
benchmark

CapTraceBench

benchmarkactiveprovisionalcaptracebench-b643dab7·1 events·first seen 7d ago

Aliases: CapTraceBench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

RedAct framework protects procedural skills in agent execution traces via selective redaction and watermarking

Researchers introduce RedAct, a framework for releasing agent execution traces without exposing proprietary procedural skills (tool invocations, decision logic, error-recovery strategies). The system localizes sensitive information, rewrites traces while preserving audit-critical evidence, and embeds behavioral watermarks for provenance tracking. To evaluate the approach, the authors construct CapTraceBench, a benchmark of 75 long-horizon tasks and 154 skills across seven domains. RedAct reduces normalized skill transfer from 44.7–67.1% on raw traces to below the no-skill baseline, while watermark detection achieves 93.6–100% true positive rate with under 2% false alarms.