Almanac
benchmark

StrongReject

benchmarkactiveprovisionalstrongreject-8a6795f7·1 events·first seen 5h ago

Aliases: StrongReject

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·5h ago·source ↗

Systematic comparison of encoder vs. decoder safety judges for LLM adversarial evaluation

A new arXiv preprint evaluates whether fine-tuned encoder classifiers from the ModernBERT family (ModernBERT and Ettin) can replace LLM-based safety judges for detecting harmful outputs in user-model conversations. The study benchmarks encoders against rule-based methods, fine-tuned LLM classifiers, and LLM judges including LlamaGuard 3/4, ShieldGemma, StrongReject, and Claude-as-a-judge across multiple adversarial attack types. Results are reported on F1, false negative rate, and precision-recall, with breakdowns by attack technique, providing practical guidance on cost-latency tradeoffs for production safety pipelines.