benchmark
ParaPairAudioBench
benchmarkactiveprovisional
parapairaudiobench-a96bfd4e·1 events·first seen 19h agoAliases: ParaPairAudioBench
More like this (12)
Recent events (1)
ParaPairAudioBench: Pairwise benchmark reveals large gaps in LALM paralinguistic judgment
Researchers introduce ParaPairAudioBench, a pairwise audio benchmark of 5,175 audio pairs spanning five paralinguistic dimensions (Style, Rate, Emphasis, Age, Gender) designed to evaluate Large Audio-Language Models as judges. Experiments show current LALMs lag human judgment by 32 percentage points on average and exhibit severe calibration failures, especially in ambiguous 'Tie' cases. The benchmark includes same-transcript and cross-transcript conditions to disentangle lexical from acoustic reliance, enabling more rigorous assessment of LALM reliability for speech evaluation.