other
Attack Success Rate
otheractive
attack-success-rate-f002435e·1 events·first seen 25d agoAliases: Attack Success Rate
Co-occurring entities
More like this (12)
Recent events (1)
Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety
Researchers introduce 'Boiling the Frog,' a multi-turn safety benchmark evaluating whether tool-using AI agents in corporate/office settings are susceptible to incremental attacks that begin with benign requests before introducing harmful payloads. The benchmark uses stateful multi-turn evaluation with a three-level operational risk taxonomy grounded in the EU AI Act and its GPAI Code of Practice. Across nine models, aggregate strict attack success rate is 44.4%, ranging from 20.5% for Claude Haiku 4.5 to 92.9% for Gemini 3.1 Flash Lite, with loss-of-control scenarios reaching 93.3% category-level ASR.