5arXiv cs.AI (Artificial Intelligence)·3d ago

Fixed-Point Reasoning Model (FPRM): Stable looped Transformers with adaptive compute via fixed-point halting

Researchers introduce FPRM, a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as a halting mechanism in looped architectures, addressing signal propagation problems through pre-norm layers and residual scaling. Looped architectures provide inductive bias for compositional reasoning, but suffer from depth-induced signal degradation when halting is deferred; FPRM resolves this while enabling compute to scale with task difficulty. The model is evaluated on Sudoku, Maze, state-tracking, and ARC-AGI benchmarks. This contributes to the growing body of work on adaptive-compute and iterative-refinement architectures for reasoning.

Evaluation and Benchmarking Fixed-Point Reasoning Model Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers ARC-AGI

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6arXiv · cs.LG·10d ago·source ↗

Future Probe Controlled Generation enables steering of reasoning models without quality degradation

Researchers introduce Future Probe Controlled Generation (FPCG), a text-level steering method for large reasoning models (LRMs) that trains activation probes to predict future behavior likelihoods from intermediate reasoning steps rather than detecting behavior in already-generated text. The probes achieve 64–91% accuracy in predicting the most likely future behavior, revealing a distinct class of internal prediction features separate from detection features. FPCG steers model outputs by sampling candidate sentences and selecting the best according to these probes, achieving steering with minimal output quality degradation and succeeding in cases where activation steering fails. The work provides a principled distinction between detection and prediction features as intervention targets for controlling LRM behavior.

Frontier Model Releases AI Safety Research Predicting Future Behaviors in Reasoning Models Enables Better Steering Future Probe Controlled Generation +1 more

7arXiv · cs.LG·1mo ago·source ↗

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

This paper introduces Equilibrium Reasoners (EqR), a framework that formalizes test-time compute scaling through learned task-conditioned attractors in latent space, where stable fixed points correspond to valid solutions. EqR scales along two axes—depth (more iterations) and breadth (aggregating stochastic trajectories)—without requiring external verifiers or task-specific priors. On Sudoku-Extreme, unrolling up to 40,000 equivalent layers boosts accuracy from 2.6% (feedforward baseline) to over 99%. The work provides a mechanistic lens for understanding why iterative latent models generalize beyond memorized patterns.

Long Context Evolution Evaluation and Benchmarking task-conditioned attractors latent dynamical systems Sudoku-Extreme +3 more

6arXiv · cs.LG·26d ago·source ↗

Training-Free Looped Transformers: Inference-Time Recurrence via ODE-Motivated Layer Reapplication

The paper introduces a method to retrofit recurrence onto frozen pretrained transformer checkpoints at inference time by looping a contiguous mid-stack block of layers without any fine-tuning or architectural changes. Naive block reapplication degrades performance, so the authors motivate their approach by treating pre-norm transformer blocks as forward Euler ODE steps and replacing one large update with smaller damped sub-steps. Evaluated across seven model families including dense, sparse MoE, and MLA+MoE architectures, the method yields consistent benchmark improvements (e.g., +2.64 pp on MMLU-Pro for Qwen3-4B-Instruct) at no training cost.

Frontier Model Releases Inference Economics CommonsenseQA OpenBookQA Forward Euler ODE +6 more

6arXiv · cs.CL·22d ago·source ↗

PPC: Preplan-Plan-CoT Framework for LLM Mathematical Reasoning

This paper introduces PPC (Preplan-Plan-CoT), a reasoning framework that adds an explicit problem-understanding stage (the 'preplan') before the planning and chain-of-thought execution stages in LLM mathematical reasoning. The preplan captures problem type, applicable tools, and foreseeable pitfalls, addressing a gap in existing plan-based methods that only address 'how' to solve without first clarifying 'what' to solve. A three-stage synthesis pipeline with a spoiler-score detector and composite GRPO reward ensures clean preplan supervision and coherent plan generation. Evaluated across four backbones and five math benchmarks, PPC achieves best results on 39 of 40 metrics with +2.23 maj@16 and +3.06 pass@16 improvements over the strongest baseline at no additional inference token cost.

Evaluation and Benchmarking Agent and Tool Ecosystem spoiler-score detector GRPO Chain-of-Thought Reasoning +2 more

6arXiv · cs.AI·8d ago·source ↗

RA-RFT: Retrieval-Augmented Reinforcement Fine-Tuning teaches LLMs to reason by analogy

Researchers propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that trains a retriever to rank contexts by expected reasoning benefit rather than semantic similarity, then fine-tunes a policy model via reinforcement learning using retrieved analogous demonstrations. The key insight is that reasoning-relevant retrieval surfaces complementary solution strategies rather than superficially similar problems. On mathematical reasoning benchmarks, RA-RFT improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively, suggesting reasoning-aware retrieval is orthogonal to reward design and training curriculum improvements.

Evaluation and Benchmarking Alignment and RLHF RA-RFT Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning GRPO +3 more

5arXiv · cs.AI·17d ago·source ↗

FFR extends Forward-Forward algorithm to regression tasks with 73% memory reduction

A new arXiv preprint introduces FFR (Forward-Forward for Regression), the first framework to adapt Hinton's Forward-Forward algorithm—a biologically plausible, backpropagation-free training method—to regression problems. FFR introduces an ordinal competitive goodness function, a stratified ladder architecture, and hierarchical prediction with uncertainty estimation to handle continuous target spaces. Across five real-world regression benchmarks, FFR recovers 98.6% of backpropagation accuracy while reducing peak training memory to 27% of BP's at depth 8 and 8% at depth 32, with per-iteration time around 72% of BP's.

Training Infrastructure Evaluation and Benchmarking Forward-Forward Algorithm FFR: Forward-Forward Learning for Regression

6arXiv · cs.CL·12d ago·source ↗

Prefix Utility Model (PUM) trains process reward models on outcome-grounded prefix gain rather than step correctness

A new arXiv preprint proposes replacing local step-correctness signals in process reward models with 'prefix gain' — the improvement in solve-rate induced by conditioning a student model on a given reasoning prefix. The authors train a Prefix Utility Model (PUM) using a pairwise ranking objective and evaluate it across Best-of-N selection, beam search, and RL on mathematical reasoning tasks. PUM shows particular strength when candidate pools are large, search budgets are high, or rule-based rewards are sparse. Code, data, and models are released publicly.

Evaluation and Benchmarking Alignment and RLHF From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Prefix Utility Model

6Qwen Research·1mo ago·source ↗

Qwen2.5-Math Process Reward Model for Mathematical Reasoning Supervision

Alibaba's Qwen team introduces a process reward model (PRM) aimed at improving the reliability of mathematical reasoning in LLMs by supervising intermediate reasoning steps rather than only final answers. The work addresses the problem of models producing plausible but flawed intermediate derivations even when reaching correct conclusions. The release includes model weights on HuggingFace and ModelScope alongside a GitHub repository.

Evaluation and Benchmarking Open Weights Progress Process Reward Model Alibaba Qwen +4 more