5arXiv cs.AI (Artificial Intelligence)·38h ago

G-RRM: Neuro-symbolic approach guides SAT solvers with recurrent neural reasoning models

Researchers introduce G-RRM (Guiding with Recurrent Reasoning Models), a neuro-symbolic framework that uses symbol-equivariant recurrent neural networks (SE-RRMs) to guide classical symbolic solvers—including backtracking and SAT solvers Glucose 4.1 and CaDiCaL 3.0.0—for constraint satisfaction problems. On 9×9 Sudoku, the approach achieves 33.3× speedup for backtracking and 1.70× for Glucose 4.1, but shows no significant gain for CaDiCaL due to its overhead-dominated runtime and inability to overwrite injected hints. The paper identifies two conditions for neural guidance to be effective: a large combinatorial search space and a solver architecture capable of dynamically overriding imperfect neural hints.

Evaluation and Benchmarking SE-RRM Glucose 4.1 CaDiCaL 3.0.0 G-RRM

Related guides (1)

Evaluation and BenchmarkingTopic guide

AI Evaluation and Benchmarking: From Leaderboards to the Limits of Measurement

Read asBeginner In-depth

Related events (8)

4arXiv · cs.CL·5d ago·source ↗

SD-GPS: Solver-Driven Autoformalization and Theorem Proposing for Geometry Problem Solving

Researchers propose SD-GPS, a neuro-symbolic framework for geometry problem solving that treats a symbolic solver as an execution oracle during both formalization and deduction stages. The system combines solvability-guided reinforcement learning for autoformalization (built on QwenVL3-2B) with an impasse-aware agent that proposes and symbolically verifies auxiliary lemmas. Evaluations on Geometry3K and PGPS9K show SD-GPS outperforms existing multimodal, neural, and neuro-symbolic baselines across multiple task regimes. The work advances the line of research on grounding neural agents in formal systems for verifiable mathematical reasoning.

Evaluation and Benchmarking Multimodal Progress PGPS9K Geometry3K Qwen-3-VL-2B +1 more

6arXiv · cs.AI·May 29, 2026·source ↗

Reasoning in Memory (RiM): Latent Reasoning via Working Memory Blocks in LLMs

RiM introduces a latent reasoning method that replaces autoregressive chain-of-thought token generation with fixed sequences of special 'memory block' tokens, allowing LLMs to perform internal computation without externalizing intermediate steps. These memory blocks are processed in a single forward pass rather than generated autoregressively, improving compute efficiency at test time. Training uses a two-stage curriculum: first grounding memory blocks by predicting explicit reasoning steps, then discarding step-level supervision and refining answers iteratively. Experiments across multiple model families and sizes show RiM matches or exceeds existing latent reasoning methods.

Evaluation and Benchmarking Inference Economics latent reasoning Chain-of-Thought Reasoning Reasoning in Memory (RiM)+3 more

7arXiv · cs.LG·May 21, 2026·source ↗

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

This paper introduces Equilibrium Reasoners (EqR), a framework that formalizes test-time compute scaling through learned task-conditioned attractors in latent space, where stable fixed points correspond to valid solutions. EqR scales along two axes—depth (more iterations) and breadth (aggregating stochastic trajectories)—without requiring external verifiers or task-specific priors. On Sudoku-Extreme, unrolling up to 40,000 equivalent layers boosts accuracy from 2.6% (feedforward baseline) to over 99%. The work provides a mechanistic lens for understanding why iterative latent models generalize beyond memorized patterns.

Long Context Evolution Evaluation and Benchmarking task-conditioned attractors latent dynamical systems Sudoku-Extreme +3 more

7Openai Blog·May 20, 2026·source ↗

Improving Mathematical Reasoning with Process Supervision

OpenAI trained a model achieving state-of-the-art mathematical problem solving by rewarding each correct reasoning step (process supervision) rather than only the final answer (outcome supervision). This approach improves performance on math benchmarks and carries an alignment benefit by training models to produce human-endorsed chain-of-thought reasoning. The work highlights a potential synergy between capability improvements and alignment techniques.

Frontier Model Releases Evaluation and Benchmarking process supervision outcome supervision Chain-of-Thought Reasoning +3 more

6arXiv · cs.AI·Jun 12, 2026·source ↗

RA-RFT: Retrieval-Augmented Reinforcement Fine-Tuning teaches LLMs to reason by analogy

Researchers propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that trains a retriever to rank contexts by expected reasoning benefit rather than semantic similarity, then fine-tunes a policy model via reinforcement learning using retrieved analogous demonstrations. The key insight is that reasoning-relevant retrieval surfaces complementary solution strategies rather than superficially similar problems. On mathematical reasoning benchmarks, RA-RFT improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively, suggesting reasoning-aware retrieval is orthogonal to reward design and training curriculum improvements.

Evaluation and Benchmarking Alignment and RLHF RA-RFT Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning GRPO +3 more

6arXiv · cs.AI·Jun 8, 2026·source ↗

MemDreamer: Hierarchical graph memory and agentic retrieval for long video understanding

MemDreamer is a plug-and-play framework that decouples perception and reasoning for long-video understanding by incrementally building a three-tier Hierarchical Graph Memory capturing spatiotemporal and causal relations. During inference, a reasoning model uses an Observation-Reason-Action loop with agentic tool-augmented retrieval to navigate the memory graph, constraining the context window to 2% of full-context ingestion while achieving a 12.5-point absolute accuracy gain. The system reaches SOTA on four benchmarks, narrowing the gap with human experts to 3.7 points. The authors also report a strong linear correlation between logical reasoning performance and long-video understanding, proposing agentic capability scaling as a new paradigm for multimodal comprehension.

Long Context Evolution Agent and Tool Ecosystem MemDreamer Hierarchical Graph Memory Observation-Reason-Action +1 more

5arXiv · cs.AI·Jun 23, 2026·source ↗

AIR: Adaptive Interleaved Reasoning with Code in Multimodal LLMs via Reinforcement Learning

Researchers propose AIR, a system that trains multimodal large language models to adaptively interleave reasoning with code execution for numerical computation tasks, going beyond prior work that focused only on visual operations. The approach combines a two-stage cold-start data pipeline, RL dataset filtering, and a group-constrained reward function for tool-invocation decisions. Experiments show a 6.1 percentage point average improvement on evaluation benchmarks, with interleaved reasoning samples gaining 9.9 pp and tool-use success exceeding 95%.

Agent and Tool Ecosystem Alignment and RLHF AIR: Adaptive Interleaved Reasoning with Code in MLLMs OpenAI +1 more

5arXiv · cs.AI·Jun 18, 2026·source ↗

MAST: Mechanism-guided selective unlearning for RLVR-trained reasoning models

Researchers introduce MAST (Mechanism-Aligned Selective Targeting), a method for selectively unlearning capabilities induced by reinforcement learning from verifiable rewards (RLVR) in language models while minimizing collateral damage to retained knowledge. The approach ranks attention-projection tensors by off-principal energy and gradient coupling to identify a targeted subset for update, rather than applying full-parameter gradient ascent. Evaluated on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, MAST achieves statistically significant forgetting on target MATH problems while preserving GSM8K performance, whereas full-parameter unlearning collapses retained capabilities. The method generalizes across seeds and unlearning objectives (NPO/SimNPO).

AI Safety Research Alignment and RLHF Qwen3-1.7B-Base MATH MAST +2 more

G-RRM: Neuro-symbolic approach guides SAT solvers with recurrent neural reasoning models

Related events (8)

4arXiv · cs.CL·5d ago·source ↗