StreamMA: Streaming communication in multi-agent reasoning reduces latency and improves accuracy
Researchers introduce StreamMA, a multi-agent reasoning system that streams individual reasoning steps to downstream agents as they are generated, rather than waiting for a complete chain. This pipelining approach reduces end-to-end latency and also improves accuracy by shielding downstream agents from error-prone late reasoning steps. Evaluated across eight benchmarks, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies, StreamMA outperforms serial and single-agent baselines by an average of 7.3 percentage points. The paper also identifies a 'step-level scaling law' — a new scaling dimension orthogonal to agent-count scaling.
Related guides (3)
Related events (8)
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling
A BAIR blog post surveys recent progress in parallel reasoning for LLMs, covering methods from simple self-consistency and Best-of-N sampling through structured search (Tree of Thoughts, MCTS) to newer adaptive approaches including ParaThinker, GroupThink, and Hogwild! Inference. The core motivation is that sequential reasoning scales linearly with exploration depth, causing latency, context-rot, and compute inefficiency. Adaptive parallel reasoning aims to let models themselves decide when and how to decompose tasks into concurrent threads, rather than imposing fixed parallel structure externally. The post frames this as an emerging inference-time scaling paradigm with implications for agentic and complex reasoning workloads.
ACTS: Agentic Chain-of-Thought Steering for efficient and controllable LLM reasoning
Researchers introduce Agentic Chain-of-Thought Steering (ACTS), a framework that formulates inference-time reasoning control as a Markov decision process, where a controller agent adaptively steers a frozen reasoner by issuing reasoning strategy directives and steering phrases at each step. The controller is initialized from synthetic steering trajectories with multi-budget augmentation and further optimized via reinforcement learning with budget-conditioned reward shaping. ACTS matches full-thinking performance with significant token savings and enables controllable accuracy-efficiency trade-offs across multiple benchmarks and reasoner models.
AdaSR: Adaptive streaming reasoning framework with Hierarchical Relative Policy Optimization
Researchers introduce AdaSR, a framework enabling large reasoning models to reason incrementally during streaming input (e.g., audio/video) rather than waiting for complete context, then perform final deliberation once the stream ends. The core contribution is Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming and deep reasoning phases with fine-grained per-phase advantage assignment, integrating format, accuracy, and latency-aware rewards. Experiments show AdaSR improves the tradeoff among reasoning accuracy, computational efficiency, and streaming latency over supervised fine-tuning baselines. Code is publicly released.
Multi-Agent Fictitious Play (MAFP) applies game-theoretic equilibrium-seeking to LLM decision-making
Researchers propose Multi-Agent Fictitious Play (MAFP), a multi-agent system paradigm that frames LLM-based decision-making as an equilibrium-seeking process borrowed from game theory. Each agent represents a stakeholder stance and iteratively best-responds to the empirical mixture of other agents' past decisions, addressing what the authors call 'stance entanglement' — mutual interdependence among stakeholder decisions that cannot be decomposed into independent subtasks. MAFP is evaluated on competitive strategy tasks and outperforms single-round and multi-round baselines on tournament strength and robustness metrics. The work extends the MAS literature beyond divide-and-conquer execution patterns into interdependent decision scenarios.
MA²P: A Meta-Cognitive Multi-Agent Framework for Complex Persuasion
The paper introduces MA²P, a multi-agent framework designed for complex persuasion tasks where the persuadee's internal states are latent. The system coordinates perception management, mental-state inference, strategy execution, memory, and evaluation modules, and adds a meta-cognitive configurator that selects domain-appropriate strategies from a structured knowledge base to reduce cross-domain performance variance. Experiments show higher persuasion success rates compared to baselines. The work addresses a known weakness of LLMs in producing generic or weakly grounded persuasive responses.
DeepMath: A Lightweight Math Reasoning Agent with smolagents
Hugging Face published a blog post introducing DeepMath, a lightweight mathematical reasoning agent built on the smolagents framework. The post demonstrates how to construct a capable math reasoning agent using small models and tool-use patterns. This represents a practical application of the agent-tool ecosystem for specialized reasoning tasks.
DABStep: Data Agent Benchmark for Multi-step Reasoning
Hugging Face introduces DABStep, a benchmark designed to evaluate data agents on multi-step reasoning tasks. The benchmark targets agentic systems that must perform complex, sequential data operations rather than single-step queries. It aims to fill a gap in evaluation tooling for realistic data analysis workflows involving tool use and chained reasoning.
MemDreamer: Hierarchical graph memory and agentic retrieval for long video understanding
MemDreamer is a plug-and-play framework that decouples perception and reasoning for long-video understanding by incrementally building a three-tier Hierarchical Graph Memory capturing spatiotemporal and causal relations. During inference, a reasoning model uses an Observation-Reason-Action loop with agentic tool-augmented retrieval to navigate the memory graph, constraining the context window to 2% of full-context ingestion while achieving a 12.5-point absolute accuracy gain. The system reaches SOTA on four benchmarks, narrowing the gap with human experts to 3.7 points. The authors also report a strong linear correlation between logical reasoning performance and long-video understanding, proposing agentic capability scaling as a new paradigm for multimodal comprehension.


