benchmark

MAS-PromptBench

benchmarkactiveprovisionalmas-promptbench-74048eb1·1 events·first seen 7h ago

Aliases: MAS-PromptBench

More like this (12)

ENPMR-Bench MemBench ProgramBench SPBench MLE Bench Lite TriggerBench CursorBench RepoBench MT-Bench LiveBench MLE-bench PowerCodeBench

Recent events (1)

5arXiv · cs.LG·7h ago·source ↗

MAS-PromptBench: Systematic study of prompt optimization in multi-agent LLM systems

A new arXiv preprint introduces MAS-PromptBench, a benchmark and study examining when and how much system-prompt optimization improves multi-agent LLM systems (MAS). The authors evaluate two prompt optimizers across diverse MAS configurations varying in task, workflow, communication protocol, and team size. Results show prompt optimization can unlock significant gains but also expose open challenges, particularly around the exponentially growing search space as agent count increases.

Evaluation and Benchmarking Agent and Tool Ecosystem MAS-PromptBench