Almanac
benchmark

MMAE

benchmarkactiveprovisionalmmae-bb1bee81·1 events·first seen 9d ago

Aliases: MMAE

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·9d ago·source ↗

MMAE: First comprehensive benchmark for instruction-based audio editing across 7 modalities

Researchers introduce MMAE, a 2,000-sample benchmark for evaluating general-purpose instruction-based audio editing systems, covering 7 audio modalities (sound, speech, music, and mixtures) and 6 levels of task complexity. The benchmark uses a rubric-based evaluation framework decomposing tasks into 17,741 verifiable criteria to assess instruction following and context consistency. Evaluation of leading models reveals severe limitations: Exact Match Rate falls below 5% overall and hits 0% on complex mixed-modality tasks, exposing fundamental gaps in current audio editing systems.