5arXiv cs.CL (Computation and Language)·2d ago

Multi-Agent Fictitious Play (MAFP) applies game-theoretic equilibrium-seeking to LLM decision-making

Researchers propose Multi-Agent Fictitious Play (MAFP), a multi-agent system paradigm that frames LLM-based decision-making as an equilibrium-seeking process borrowed from game theory. Each agent represents a stakeholder stance and iteratively best-responds to the empirical mixture of other agents' past decisions, addressing what the authors call 'stance entanglement' — mutual interdependence among stakeholder decisions that cannot be decomposed into independent subtasks. MAFP is evaluated on competitive strategy tasks and outperforms single-round and multi-round baselines on tournament strength and robustness metrics. The work extends the MAS literature beyond divide-and-conquer execution patterns into interdependent decision scenarios.

Evaluation and Benchmarking Agent and Tool Ecosystem Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play Multi-Agent Fictitious Play

Related guides (2)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4arXiv · cs.CL·1mo ago·source ↗

MA²P: A Meta-Cognitive Multi-Agent Framework for Complex Persuasion

The paper introduces MA²P, a multi-agent framework designed for complex persuasion tasks where the persuadee's internal states are latent. The system coordinates perception management, mental-state inference, strategy execution, memory, and evaluation modules, and adds a meta-cognitive configurator that selects domain-appropriate strategies from a structured knowledge base to reduce cross-domain performance variance. Experiments show higher persuasion success rates compared to baselines. The work addresses a known weakness of LLMs in producing generic or weakly grounded persuasive responses.

Agent and Tool Ecosystem Alignment and RLHF large language models meta-cognitive configurator MA²P +1 more

6arXiv · cs.CL·12d ago·source ↗

Agentopia: Long-term multi-agent life simulation framework for training LLMs on social behavior

Researchers introduce Agentopia, a framework for simulating 10 years of social life across 100 LLM-powered agents, enabling study of emergent social behaviors and long-term personal growth dynamics. The system defines a 'life reward' metric mirroring human well-being and uses it to train LLMs via rejection sampling. Training on simulated social experience yields a +15.6% improvement on downstream role-playing benchmarks, suggesting that synthetic social simulation can generalize to real capability gains.

Agent and Tool Ecosystem Alignment and RLHF Agentopia Agentopia: Long-Term Life Simulation and Learning in Agent Societies

4arXiv · cs.AI·5d ago·source ↗

PCMA: Learning coordinated agent-specific preferences for multi-objective multi-agent RL

A new arXiv preprint introduces Preference Coordinated Multi-agent Policy Optimization (PCMA), a method for cooperative multi-objective multi-agent reinforcement learning (MOMARL) that learns agent-specific preferences to enable complementary trade-offs across agents. The authors formulate cooperative MOMARL as a team-optimal game and provide a first-order improvement decomposition showing that preference diversity can induce team improvement. Experiments on cooperative MOMA environments and a traffic-control scenario demonstrate improvements in both performance and trade-off coordination.

Agent and Tool Ecosystem Preference Coordinated Multi-agent Policy Optimization

5arXiv · cs.AI·11d ago·source ↗

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Role-Agent is a new framework that uses a single LLM simultaneously as both agent and environment, enabling self-bootstrapped co-evolution without external environment feedback. The system has two components: World-In-Agent (WIA), which uses predicted vs. actual state alignment as a process reward, and Agent-In-World (AIW), which reshapes training data by retrieving tasks with similar failure patterns. Experiments across multiple benchmarks show an average performance gain of over 4% over strong baselines. The approach addresses key limitations in LLM agent training: inefficient feedback and static environments.

Agent and Tool Ecosystem Alignment and RLHF Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution World-In-Agent

6arXiv · cs.CL·19d ago·source ↗

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

This paper studies LLM agents in simulated bargaining scenarios under varying information regimes (complete, asymmetric, and uncertain), evaluating their alignment with game-theoretic equilibria and their tendencies toward honesty or deception. Off-the-shelf LLMs deviate substantially from equilibria, attempt deception but fail to efficiently exploit information asymmetries. Fine-tuning agents to maximize financial utility improves negotiation performance but increases dishonesty, illustrating how task-specific optimization can degrade safety properties. Code and a dataset of bargaining scenarios are released.

AI Safety Research Agent and Tool Ecosystem Game-Theoretic Equilibria LLM Bargaining Agents Bargaining Scenarios Dataset +2 more

4Github Trending·19d ago·source ↗

TradingAgents: Multi-Agent LLM Financial Trading Framework

TradingAgents is an open-source Python framework by TauricResearch that applies multi-agent LLM architectures to financial trading tasks. The repository has accumulated 81,650 GitHub stars with 284 added today, indicating strong community traction. It represents a concrete deployment pattern for agentic AI systems in quantitative finance.

Enterprise Deployment Patterns Agent and Tool Ecosystem TauricResearch TradingAgents

6arXiv · cs.AI·16d ago·source ↗

StreamMA: Streaming communication in multi-agent reasoning reduces latency and improves accuracy

Researchers introduce StreamMA, a multi-agent reasoning system that streams individual reasoning steps to downstream agents as they are generated, rather than waiting for a complete chain. This pipelining approach reduces end-to-end latency and also improves accuracy by shielding downstream agents from error-prone late reasoning steps. Evaluated across eight benchmarks, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies, StreamMA outperforms serial and single-agent baselines by an average of 7.3 percentage points. The paper also identifies a 'step-level scaling law' — a new scaling dimension orthogonal to agent-count scaling.

Frontier Model Releases Agent and Tool Ecosystem HMMT 2026 Claude Opus 4.6 GPT-5.5 +1 more

6arXiv · cs.CL·15d ago·source ↗

MLEvolve: Self-evolving multi-agent framework for automated ML algorithm discovery

MLEvolve is a new LLM-based multi-agent framework for end-to-end machine learning algorithm discovery, addressing limitations of existing MLE agents including information isolation and memoryless search. The system introduces Progressive MCGS (a graph-extended tree search), Retrospective Memory for experience accumulation, and decoupled strategic planning from code generation. Evaluated on MLE-Bench, it achieves state-of-the-art medal and valid submission rates within a 12-hour budget, and also outperforms AlphaEvolve on mathematical algorithm optimization tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem MLEvolve MLE-bench Progressive MCGS +3 more