Almanac
technique

adversarial pragmatics

techniqueactiveprovisionaladversarial-pragmatics-fee8d00c·1 events·first seen 32h ago

Aliases: adversarial pragmatics

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·32h ago·source ↗

Adversarial Pragmatics benchmark for AI safety evaluation under instruction conflict and ambiguity

A new arXiv preprint introduces 'adversarial pragmatics' as both a benchmark and annotation protocol for evaluating language model behavior under linguistically complex conditions: instruction conflict, embedded commands, quotation, scope ambiguity, deixis, and multi-turn agentic transcripts. The work critiques existing safety benchmarks for collapsing nuanced failure modes into pass/fail labels, and proposes a taxonomy with an 18-item seed benchmark and expert-evaluation protocol that distinguishes task success, policy compliance, safety risk, refusal outcome, and evaluator confidence. The framework is designed to validate safety evals, LLM judges, gold-set construction, and prompt-injection tests. The contribution is primarily methodological, targeting the infrastructure of safety evaluation rather than model capabilities directly.