Cybersecurity Task Evaluation
cybersecurity-task-evaluation-4d735445·1 events·first seen 16d agoAliases: Cybersecurity Task Evaluation
Co-occurring entities
More like this (12)
Recent events (1)
Stateful Online Monitoring Catches Distributed Agent Attacks via Cross-Account Clustering
Researchers demonstrate the first known distributed agent attack, a multi-agent scaffold that splits harmful cybersecurity tasks across many user accounts to evade per-transcript safety monitors, reducing detection rates to roughly one-fifth of standard attacks. As a defense, they develop a stateful online monitor that clusters weak suspiciousness signals across many agent transcripts in real time, escalating only rarely to a full LM-based review. In large-scale simulated datacenter traffic evaluations, the monitor Pareto-dominates standard monitors by catching distributed attacks 30% earlier with negligible latency overhead for ~99% of traffic. The system also incidentally catches standard jailbreaks, since adaptive attackers tend to reuse attack variants across accounts.