product
BINEVAL
productactiveprovisional
bineval-ca002b5e·1 events·first seen 2d agoAliases: BINEVAL
Co-occurring entities
More like this (12)
Recent events (1)
BINEVAL: Binary question decomposition for interpretable LLM evaluation and prompt optimization
Researchers introduce BINEVAL, a framework that decomposes LLM evaluation criteria into atomic binary yes/no questions, aggregating answers into multi-dimensional interpretable scores. The approach matches or outperforms baselines including UniEval and G-Eval on SummEval, Topical-Chat, and QAGS benchmarks, with particular strength on factual consistency. Beyond evaluation, the binary question feedback is shown to support iterative prompt optimization in both self-update and cross-model settings on IFBench. The framework is training-free and task-agnostic, addressing opacity and ceiling-effect problems common in holistic LLM judges.