Message Passing Language Models (MPLMs) enable efficient parallel reasoning via inter-thread communication
Researchers introduce Message Passing Language Models (MPLMs), a framework that extends parallel inference-time scaling by allowing LLM reasoning threads to communicate directly via send/receive primitives rather than operating in isolation as in fork-join approaches. MPLMs reduce computational costs through avoiding redundant context sharing and enabling early termination of unpromising branches (preemption). The framework is demonstrated on Sudoku puzzles (achieving asymptotically smaller context than CoT or fork-join), 3-SAT problems, and long-context QA, with a fine-tuned model solving 25×25 Sudoku puzzles that challenge frontier reasoning models.
Related guides (2)
Related events (8)
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling
A BAIR blog post surveys recent progress in parallel reasoning for LLMs, covering methods from simple self-consistency and Best-of-N sampling through structured search (Tree of Thoughts, MCTS) to newer adaptive approaches including ParaThinker, GroupThink, and Hogwild! Inference. The core motivation is that sequential reasoning scales linearly with exploration depth, causing latency, context-rot, and compute inefficiency. Adaptive parallel reasoning aims to let models themselves decide when and how to decompose tasks into concurrent threads, rather than imposing fixed parallel structure externally. The post frames this as an emerging inference-time scaling paradigm with implications for agentic and complex reasoning workloads.
StreamMA: Streaming communication in multi-agent reasoning reduces latency and improves accuracy
Researchers introduce StreamMA, a multi-agent reasoning system that streams individual reasoning steps to downstream agents as they are generated, rather than waiting for a complete chain. This pipelining approach reduces end-to-end latency and also improves accuracy by shielding downstream agents from error-prone late reasoning steps. Evaluated across eight benchmarks, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies, StreamMA outperforms serial and single-agent baselines by an average of 7.3 percentage points. The paper also identifies a 'step-level scaling law' — a new scaling dimension orthogonal to agent-count scaling.
Tracing the Emergence of Human-Like Pragmatic Reasoning in LLMs Across Languages
Researchers conducted a population-matching experiment evaluating 25 LLMs on conditional inference tasks across four languages, comparing model behavior to matched human populations. The study finds that LLMs function as accurate semantic operators but systematically fail to capture pragmatic enrichments—context-sensitive inferences beyond literal logical meaning—that humans apply effortlessly. Model performance on pragmatic reasoning is not predicted by open vs. closed weights, training orientation, or architecture type, suggesting pragmatic reasoning remains an emergent and unreliable capability. The findings contribute to ongoing debates about whether LLMs reason like humans or merely approximate surface-level linguistic patterns.
PPC: Preplan-Plan-CoT Framework for LLM Mathematical Reasoning
This paper introduces PPC (Preplan-Plan-CoT), a reasoning framework that adds an explicit problem-understanding stage (the 'preplan') before the planning and chain-of-thought execution stages in LLM mathematical reasoning. The preplan captures problem type, applicable tools, and foreseeable pitfalls, addressing a gap in existing plan-based methods that only address 'how' to solve without first clarifying 'what' to solve. A three-stage synthesis pipeline with a spoiler-score detector and composite GRPO reward ensures clean preplan supervision and coherent plan generation. Evaluated across four backbones and five math benchmarks, PPC achieves best results on 39 of 40 metrics with +2.23 maj@16 and +3.06 pass@16 improvements over the strongest baseline at no additional inference token cost.
Diffusion-Proof: First framework applying diffusion LLMs to formal theorem proving
Researchers introduce Diffusion-Proof, the first framework to train and apply diffusion language models (dLLMs) for formal theorem proving, addressing limitations of autoregressive models in long-range coherence. The framework includes dLLM-Prover-7B for whole-proof generation and dLLM-Corrector-7B for local proof correction via bidirectional infilling. Diffusion-Proof achieves absolute improvements of 1.61% on ProofNet-Test and 6.14% on MiniF2F-Test over an AR baseline, and solves one IMO problem that DeepSeek-Prover-V2-7B could not. The result suggests dLLMs may have structural advantages over AR models for tasks requiring long-range logical coherence.
PAC-Bayes analysis establishes formal expressivity and alignment floors for prompt-conditioned LLMs
A new arXiv preprint models user-LLM interaction as a bilevel cheap-talk game and derives PAC-Bayes bounds showing two irreducible limitations: an 'expressivity floor' where language's finite channel capacity makes distinct tasks indistinguishable, and an 'objective-misalignment floor' where alignment constraints prevent reaching user-ideal outputs. The authors prove that prompt-conditioned LLMs cannot be universal problem solvers, as correct behavior on certain task families is provably unattainable even with infinite data, optimal training, or model scaling. The work suggests multimodal inputs and external memory as potential mitigations by increasing task-relevant information bandwidth.
Learning to Reason with LLMs
OpenAI announced a new model or capability focused on reasoning in large language models, published on September 12, 2024. The post, hosted on the OpenAI blog, describes advances in training LLMs to perform complex multi-step reasoning. This likely corresponds to the release of the o1 (formerly 'Strawberry') model series, which uses chain-of-thought reasoning trained via reinforcement learning to achieve significantly improved performance on math, science, and coding benchmarks.
AIR: Adaptive Interleaved Reasoning with Code in Multimodal LLMs via Reinforcement Learning
Researchers propose AIR, a system that trains multimodal large language models to adaptively interleave reasoning with code execution for numerical computation tasks, going beyond prior work that focused only on visual operations. The approach combines a two-stage cold-start data pipeline, RL dataset filtering, and a group-constrained reward function for tool-invocation decisions. Experiments show a 6.1 percentage point average improvement on evaluation benchmarks, with interleaved reasoning samples gaining 9.9 pp and tool-use success exceeding 95%.

