benchmark

MECoBench

benchmarkactiveprovisionalmecobench-d9b03d67·1 events·first seen 2d ago

Aliases: MECoBench

More like this (12)

MMBench2 MemBench RMCBench FoldBench MaDI-Bench SorryBench EvoBench LiveBench PowerCodeBench CursorBench MTBench RepoBench

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

MECoBench: Benchmark for Multimodal Agent Collaboration in Embodied Environments

Researchers introduce MECoBench, a benchmark and evaluation platform for assessing multimodal LLM collaboration in visually grounded embodied environments. The benchmark spans diverse real-world tasks, two cooperation structures, and three collaboration modes. Key findings include that collaboration generally improves task completion but depends on balancing gains against coordination complexity, that communication is essential to collaboration benefits, and that collaboration improves robustness under noisy conditions.

Evaluation and Benchmarking Agent and Tool Ecosystem MECoBench +1 more