Almanac
← Events
6Hugging Face Blog·1mo ago

Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models

Kimina-Prover is a new large formal reasoning model that combines reinforcement learning with test-time search to improve mathematical theorem proving. The approach applies RL-trained search strategies at inference time, targeting formal proof generation in systems like Lean. The work is published via the AI-MO (AI for Math Olympiad) team on Hugging Face, continuing the trend of applying RL and extended compute at test time to hard reasoning tasks.

Related guides (3)

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Kimina-Prover-RL: Reinforcement Learning for Formal Mathematical Proving

Hugging Face blog post introduces Kimina-Prover-RL, a model trained with reinforcement learning targeting formal mathematical theorem proving. The post appears to describe a system from the AI-MO (AI for Math Olympiad) initiative. This represents a development in applying RL to formal proof generation, a competitive area involving Lean/Mathlib-style verification environments.

6arXiv · cs.CL·1mo ago·source ↗

Probe Trajectories Reveal Reasoning Dynamics in Large Reasoning Models

This paper investigates whether hidden representations of Large Reasoning Models (LRMs) can predict future model behavior by analyzing probe trajectories—the continuous evolution of concept probabilities across Chain-of-Thought reasoning tokens. The authors find that temporal trajectory features (volatility, trend, steady-state) significantly outperform single static probes, with max-pooling achieving up to 95% AUROC across safety and mathematics domains. Two methodological insights are offered: template-based training data matches dynamically generated responses in quality, and pooling strategy is critical to probe performance. The work positions probe trajectories as a complementary safety monitoring framework for LRMs where CoT faithfulness cannot be assumed.

8arXiv · cs.AI·29d ago·source ↗

Large-Scale Evaluation of LLM-Driven Formal Proof Search on Open Mathematical Problems

Researchers present the first large-scale evaluation of LLM-based formal proof search on genuinely open mathematical problems, using Lean as a verification backend. Their most capable agent autonomously resolved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures, at a cost of a few hundred dollars per problem. The system is already being deployed in active research across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics. The study also compares agent architectures, finding that more sophisticated designs outperform simple generate-and-verify loops on the hardest problems.

6arXiv · cs.LG·4d ago·source ↗

ExpRL: RL-based mid-training using human QA data as reward scaffolds for LLM reasoning

ExpRL proposes an automated approach to LLM mid-training that replaces manually curated reasoning traces with large corpora of human-written QA data used as reward scaffolds rather than imitation targets. Reference solutions are hidden from the policy and used only to construct problem-specific grading rubrics, enabling dense process-level rewards that reinforce partial progress and intermediate reasoning steps. On challenging math reasoning benchmarks, ExpRL outperforms SFT, sparse-reward GRPO, and self-distillation as an RL initialization strategy, with additional mixed-domain experiments suggesting broader applicability.

6Qwen Research·1mo ago·source ↗

Qwen2.5-Math Process Reward Model for Mathematical Reasoning Supervision

Alibaba's Qwen team introduces a process reward model (PRM) aimed at improving the reliability of mathematical reasoning in LLMs by supervising intermediate reasoning steps rather than only final answers. The work addresses the problem of models producing plausible but flawed intermediate derivations even when reaching correct conclusions. The release includes model weights on HuggingFace and ModelScope alongside a GitHub repository.

7Openai Blog·1mo ago·source ↗

OpenAI Neural Theorem Prover Solves Formal Math Olympiad Problems in Lean

OpenAI developed a neural theorem prover integrated with the Lean proof assistant that can solve challenging high-school olympiad problems, including problems from AMC12, AIME, and two IMO-adapted problems. The system demonstrates automated formal mathematical reasoning at a level previously requiring human expertise. This represents a significant capability milestone in AI-assisted formal verification and mathematical problem-solving.

6arXiv · cs.AI·22d ago·source ↗

Reasoning in Memory (RiM): Latent Reasoning via Working Memory Blocks in LLMs

RiM introduces a latent reasoning method that replaces autoregressive chain-of-thought token generation with fixed sequences of special 'memory block' tokens, allowing LLMs to perform internal computation without externalizing intermediate steps. These memory blocks are processed in a single forward pass rather than generated autoregressively, improving compute efficiency at test time. Training uses a two-stage curriculum: first grounding memory blocks by predicting explicit reasoning steps, then discarding step-level supervision and refining answers iteratively. Experiments across multiple model families and sizes show RiM matches or exceeds existing latent reasoning methods.

5Openai Blog·1mo ago·source ↗

Generative Language Modeling for Automated Theorem Proving

OpenAI published research on applying generative language models to automated theorem proving, an early exploration of using neural language models to assist formal mathematical reasoning. The work investigates how language models can generate proof steps or complete proofs in formal systems. This represents an early milestone in AI-assisted mathematical reasoning, predating later work like GPT-f and subsequent theorem-proving systems.