Almanac
← Events
6arXiv cs.AI (Artificial Intelligence)·16d ago

StreamMA: Streaming communication in multi-agent reasoning reduces latency and improves accuracy

Researchers introduce StreamMA, a multi-agent reasoning system that streams individual reasoning steps to downstream agents as they are generated, rather than waiting for a complete chain. This pipelining approach reduces end-to-end latency and also improves accuracy by shielding downstream agents from error-prone late reasoning steps. Evaluated across eight benchmarks, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies, StreamMA outperforms serial and single-agent baselines by an average of 7.3 percentage points. The paper also identifies a 'step-level scaling law' — a new scaling dimension orthogonal to agent-count scaling.

Related guides (3)

Related events (8)

6Berkeley Ai Research (Bair) Blog·1mo ago·source ↗

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

A BAIR blog post surveys recent progress in parallel reasoning for LLMs, covering methods from simple self-consistency and Best-of-N sampling through structured search (Tree of Thoughts, MCTS) to newer adaptive approaches including ParaThinker, GroupThink, and Hogwild! Inference. The core motivation is that sequential reasoning scales linearly with exploration depth, causing latency, context-rot, and compute inefficiency. Adaptive parallel reasoning aims to let models themselves decide when and how to decompose tasks into concurrent threads, rather than imposing fixed parallel structure externally. The post frames this as an emerging inference-time scaling paradigm with implications for agentic and complex reasoning workloads.

5arXiv · cs.CL·17d ago·source ↗

ACTS: Agentic Chain-of-Thought Steering for efficient and controllable LLM reasoning

Researchers introduce Agentic Chain-of-Thought Steering (ACTS), a framework that formulates inference-time reasoning control as a Markov decision process, where a controller agent adaptively steers a frozen reasoner by issuing reasoning strategy directives and steering phrases at each step. The controller is initialized from synthetic steering trajectories with multi-budget augmentation and further optimized via reinforcement learning with budget-conditioned reward shaping. ACTS matches full-thinking performance with significant token savings and enables controllable accuracy-efficiency trade-offs across multiple benchmarks and reasoner models.

5arXiv · cs.CL·5d ago·source ↗

AdaSR: Adaptive streaming reasoning framework with Hierarchical Relative Policy Optimization

Researchers introduce AdaSR, a framework enabling large reasoning models to reason incrementally during streaming input (e.g., audio/video) rather than waiting for complete context, then perform final deliberation once the stream ends. The core contribution is Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming and deep reasoning phases with fine-grained per-phase advantage assignment, integrating format, accuracy, and latency-aware rewards. Experiments show AdaSR improves the tradeoff among reasoning accuracy, computational efficiency, and streaming latency over supervised fine-tuning baselines. Code is publicly released.

5arXiv · cs.CL·2d ago·source ↗

Multi-Agent Fictitious Play (MAFP) applies game-theoretic equilibrium-seeking to LLM decision-making

Researchers propose Multi-Agent Fictitious Play (MAFP), a multi-agent system paradigm that frames LLM-based decision-making as an equilibrium-seeking process borrowed from game theory. Each agent represents a stakeholder stance and iteratively best-responds to the empirical mixture of other agents' past decisions, addressing what the authors call 'stance entanglement' — mutual interdependence among stakeholder decisions that cannot be decomposed into independent subtasks. MAFP is evaluated on competitive strategy tasks and outperforms single-round and multi-round baselines on tournament strength and robustness metrics. The work extends the MAS literature beyond divide-and-conquer execution patterns into interdependent decision scenarios.

4arXiv · cs.CL·1mo ago·source ↗

MA²P: A Meta-Cognitive Multi-Agent Framework for Complex Persuasion

The paper introduces MA²P, a multi-agent framework designed for complex persuasion tasks where the persuadee's internal states are latent. The system coordinates perception management, mental-state inference, strategy execution, memory, and evaluation modules, and adds a meta-cognitive configurator that selects domain-appropriate strategies from a structured knowledge base to reduce cross-domain performance variance. Experiments show higher persuasion success rates compared to baselines. The work addresses a known weakness of LLMs in producing generic or weakly grounded persuasive responses.

4Hugging Face Blog·1mo ago·source ↗

DeepMath: A Lightweight Math Reasoning Agent with smolagents

Hugging Face published a blog post introducing DeepMath, a lightweight mathematical reasoning agent built on the smolagents framework. The post demonstrates how to construct a capable math reasoning agent using small models and tool-use patterns. This represents a practical application of the agent-tool ecosystem for specialized reasoning tasks.

5Hugging Face Blog·1mo ago·source ↗

DABStep: Data Agent Benchmark for Multi-step Reasoning

Hugging Face introduces DABStep, a benchmark designed to evaluate data agents on multi-step reasoning tasks. The benchmark targets agentic systems that must perform complex, sequential data operations rather than single-step queries. It aims to fill a gap in evaluation tooling for realistic data analysis workflows involving tool use and chained reasoning.

6arXiv · cs.AI·12d ago·source ↗

MemDreamer: Hierarchical graph memory and agentic retrieval for long video understanding

MemDreamer is a plug-and-play framework that decouples perception and reasoning for long-video understanding by incrementally building a three-tier Hierarchical Graph Memory capturing spatiotemporal and causal relations. During inference, a reasoning model uses an Observation-Reason-Action loop with agentic tool-augmented retrieval to navigate the memory graph, constraining the context window to 2% of full-context ingestion while achieving a 12.5-point absolute accuracy gain. The system reaches SOTA on four benchmarks, narrowing the gap with human experts to 3.7 points. The authors also report a strong linear correlation between logical reasoning performance and long-video understanding, proposing agentic capability scaling as a new paradigm for multimodal comprehension.