benchmark
AdversaBench
benchmarkactiveprovisional
adversabench-578f59aa·1 events·first seen 19h agoAliases: AdversaBench
Co-occurring entities
More like this (12)
Recent events (1)
AdversaBench: Automated LLM red-teaming pipeline with multi-judge confirmation and cross-model transferability
AdversaBench is a new end-to-end red-teaming pipeline that mutates seed prompts using five structured operators and confirms failures via a three-judge panel with a meta-judge tiebreaker. Experiments on 45 seeds across reasoning, instruction-following, and tool-use categories produced confirmed failures for every seed. Key findings include sharp variation in operator effectiveness by category, misleading binary failure rates, judge agreement metrics distorted by label skew, and zero-shot transferability of adversarial prompts from Llama 3.1 8B to Llama 3.3 70B. Code and dataset are publicly released.