model
Ettin
modelactiveprovisional
ettin-7aa75982·1 events·first seen 5h agoAliases: Ettin
Co-occurring entities
More like this (12)
Recent events (1)
Systematic comparison of encoder vs. decoder safety judges for LLM adversarial evaluation
A new arXiv preprint evaluates whether fine-tuned encoder classifiers from the ModernBERT family (ModernBERT and Ettin) can replace LLM-based safety judges for detecting harmful outputs in user-model conversations. The study benchmarks encoders against rule-based methods, fine-tuned LLM classifiers, and LLM judges including LlamaGuard 3/4, ShieldGemma, StrongReject, and Claude-as-a-judge across multiple adversarial attack types. Results are reported on F1, false negative rate, and precision-recall, with breakdowns by attack technique, providing practical guidance on cost-latency tradeoffs for production safety pipelines.