benchmark
MECoBench
benchmarkactiveprovisional
mecobench-d9b03d67·1 events·first seen 2d agoAliases: MECoBench
More like this (12)
Recent events (1)
MECoBench: Benchmark for Multimodal Agent Collaboration in Embodied Environments
Researchers introduce MECoBench, a benchmark and evaluation platform for assessing multimodal LLM collaboration in visually grounded embodied environments. The benchmark spans diverse real-world tasks, two cooperation structures, and three collaboration modes. Key findings include that collaboration generally improves task completion but depends on balancing gains against coordination complexity, that communication is essential to collaboration benefits, and that collaboration improves robustness under noisy conditions.