Almanac
benchmark

MECoBench

benchmarkactiveprovisionalmecobench-d9b03d67·1 events·first seen 2d ago

Aliases: MECoBench

More like this (12)

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

MECoBench: Benchmark for Multimodal Agent Collaboration in Embodied Environments

Researchers introduce MECoBench, a benchmark and evaluation platform for assessing multimodal LLM collaboration in visually grounded embodied environments. The benchmark spans diverse real-world tasks, two cooperation structures, and three collaboration modes. Key findings include that collaboration generally improves task completion but depends on balancing gains against coordination complexity, that communication is essential to collaboration benefits, and that collaboration improves robustness under noisy conditions.