paper

EvalSafetyGap

paperactiveprovisionalevalsafetygap-4b69c534·1 events·first seen 3d ago

Aliases: EvalSafetyGap

Co-occurring entities

More like this (12)

ValueEval Cue Visibility Gap Dynamic-Probabilistic Consistency Gap G-Eval sim-to-real gap Safety Usage Dashboard AI Safety Level Standards Safe Exploration Benchmark Guard Rail Validation joint safety evaluation GenEval Abstraction Gap

Recent events (1)

5arXiv · cs.AI·3d ago·source ↗

EvalSafetyGap: Conceptual framework linking LLM evaluation failures to safety measurement gaps

A new arXiv preprint introduces EvalSafetyGap, a hybrid survey and conceptual framework arguing that benchmark scores, reward-model signals, and safety metrics can improve while the underlying properties they measure remain unverified. The paper synthesizes eight evidence streams spanning 2018–2026 and introduces two analytical constructs — an Instability Decomposition and an Alignment Trilemma — to structure comparisons between evaluation-side and alignment-side proxy failures under optimization pressure. A ten-model audit finds no statistically significant association between capability and adversarial robustness, and suggests the apparent open-versus-closed-model safety gap is driven more by governance and disclosure practices than behavioral robustness. The work proposes a shared vocabulary for dynamic evaluation, multi-attempt safety measurement, and auditable alignment practice.

Evaluation and Benchmarking AI Safety Research Goodhart's Law EvalSafetyGap +1 more