Informath: Symbolic informalization for converting formal proofs to fluent natural language
The paper introduces Informath, a project for symbolic informalization — converting formally verified mathematics into readable natural language without loss of precision. The architecture uses Dedukti as an interlingua hub connecting proof systems (Agda, Lean, Rocq) and Grammatical Framework (GF) for multilingual natural language generation. The work is relevant to AI-assisted formal verification pipelines where autoformalization produces machine-checked proofs that need to be made human-interpretable.
Related events (8)
Formal theory shows infinite trivial output is provably necessary for AI systems generating valuable mathematics
A new arXiv paper models AI-assisted formal mathematics generation as a nested language-generation-in-the-limit problem, using a proof checker as a membership oracle and an adversarial enumeration of the mathematical literature as the signal for 'valuable' content. The authors prove a sharp dichotomy: generators emitting only finitely many trivial (correct but worthless) statements achieve at most α/2 coverage of unseen valuable mathematics, while allowing an infinite (but asymptotically vanishing) stream of trivia raises the optimum to 1−α/2. The central result is that a perfect verifier cannot substitute for mathematical taste, and the flood of certified-but-trivial output from AI proof systems is a provable mathematical necessity, not an engineering failure. The work formalizes the gap between formal verifiability and mathematical value, which is increasingly the binding constraint as AI-proof-assistant systems scale.
Generative Language Modeling for Automated Theorem Proving
OpenAI published research on applying generative language models to automated theorem proving, an early exploration of using neural language models to assist formal mathematical reasoning. The work investigates how language models can generate proof steps or complete proofs in formal systems. This represents an early milestone in AI-assisted mathematical reasoning, predating later work like GPT-f and subsequent theorem-proving systems.
Goedel-Architect achieves state-of-the-art formal theorem proving with blueprint-based agentic framework
Goedel-Architect is an agentic framework for formal theorem proving in Lean 4 that uses blueprint generation — a dependency graph of definitions and lemmas — rather than recursive decomposition, enabling parallel lemma closure and global refinement. Built on DeepSeek-V4-Flash (284B-A13B), it achieves 99.2% pass@1 on MiniF2F-test and 75.6% on PutnamBench, scaling to 100% on MiniF2F, 88.8% on PutnamBench, and 4/6 on IMO 2025 when seeded with natural-language proofs. The authors claim state-of-the-art performance for an open-source pipeline at up to 500x lower cost than comparable systems.
OpenAI Neural Theorem Prover Solves Formal Math Olympiad Problems in Lean
OpenAI developed a neural theorem prover integrated with the Lean proof assistant that can solve challenging high-school olympiad problems, including problems from AMC12, AIME, and two IMO-adapted problems. The system demonstrates automated formal mathematical reasoning at a level previously requiring human expertise. This represents a significant capability milestone in AI-assisted formal verification and mathematical problem-solving.
Kimina-Prover-RL: Reinforcement Learning for Formal Mathematical Proving
Hugging Face blog post introduces Kimina-Prover-RL, a model trained with reinforcement learning targeting formal mathematical theorem proving. The post appears to describe a system from the AI-MO (AI for Math Olympiad) initiative. This represents a development in applying RL to formal proof generation, a competitive area involving Lean/Mathlib-style verification environments.
AI-Assisted Theorem Proving in Lean 4: Aristotle API Case Study on IMO 2009 Problem 6
This paper presents a case study of using the Aristotle API for AI-assisted formal theorem proving in Lean 4, targeting the Grasshopper problem (IMO 2009 Problem 6). The generated artifact verifies four helper lemmas but leaves the main theorem unresolved via a 'sorry' placeholder, exposing a key limitation: local proof search can succeed while global combinatorial bookkeeping remains unsolved. The study provides a reproducible Lean artifact and precise analysis distinguishing verified from unverified proof content, offering a concrete benchmark for evaluating AI formalization capabilities.
COMPOSE: Dual-Graph Framework for Generating Future Mathematical Theorems from Citations and Formal Structure
COMPOSE is a framework that generates plausible future mathematical theorem-like claims by conditioning a language model on both a scientific citation graph and a formal theorem dependency graph simultaneously. The authors construct a dataset of 108K paired scientific-formal graph examples from arXiv and Mathlib, plus a benchmark of 47K future papers from 2024–2025. Experiments show COMPOSE outperforms baselines on retrieval to real future papers and LLM-judge evaluation, producing more grounded and mathematically richer outputs. The work advances AI-assisted mathematical reasoning by combining informal scientific context with formal proof structure.
Iteris: Agentic Research Loops for Computational Mathematics
Iteris is an agentic AI research system designed to tackle open problems in computational mathematics, combining numerical experimentation, adversarial construction, and algorithm design within an automated loop. Applied to two open problems from a Simons Workshop collection, Iteris produced numerical evidence, constructions, and proof drafts that—after expert review—yielded verified results: a phase diagram comparing conjugate gradient vs. randomized coordinate descent, and a counterexample to QR factorization with column pivoting under low coherence. The paper argues that agentic AI can meaningfully participate in mathematical research workflows while human validation remains essential.