4arXiv cs.CL (Computation and Language)·Jun 16, 2026

Informath: Symbolic informalization for converting formal proofs to fluent natural language

The paper introduces Informath, a project for symbolic informalization — converting formally verified mathematics into readable natural language without loss of precision. The architecture uses Dedukti as an interlingua hub connecting proof systems (Agda, Lean, Rocq) and Grammatical Framework (GF) for multilingual natural language generation. The work is relevant to AI-assisted formal verification pipelines where autoformalization produces machine-checked proofs that need to be made human-interpretable.

Informath Grammatical Framework Dedukti Agda Lean Rocq

Related events (8)

6arXiv · cs.AI·Jun 15, 2026·source ↗

Formal theory shows infinite trivial output is provably necessary for AI systems generating valuable mathematics

A new arXiv paper models AI-assisted formal mathematics generation as a nested language-generation-in-the-limit problem, using a proof checker as a membership oracle and an adversarial enumeration of the mathematical literature as the signal for 'valuable' content. The authors prove a sharp dichotomy: generators emitting only finitely many trivial (correct but worthless) statements achieve at most α/2 coverage of unseen valuable mathematics, while allowing an infinite (but asymptotically vanishing) stream of trivia raises the optimum to 1−α/2. The central result is that a perfect verifier cannot substitute for mathematical taste, and the flood of certified-but-trivial output from AI proof systems is a provable mathematical necessity, not an engineering failure. The work formalizes the gap between formal verifiability and mathematical value, which is increasingly the binding constraint as AI-proof-assistant systems scale.

Evaluation and Benchmarking AI Safety Research Angluin's condition Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit Language Generation in the Limit

5Openai Blog·May 20, 2026·source ↗

Generative Language Modeling for Automated Theorem Proving

OpenAI published research on applying generative language models to automated theorem proving, an early exploration of using neural language models to assist formal mathematical reasoning. The work investigates how language models can generate proof steps or complete proofs in formal systems. This represents an early milestone in AI-assisted mathematical reasoning, predating later work like GPT-f and subsequent theorem-proving systems.

Frontier Model Releases Evaluation and Benchmarking automated theorem proving generative language modeling GPT-f +1 more

8arXiv · cs.AI·Jun 5, 2026·source ↗

Goedel-Architect achieves state-of-the-art formal theorem proving with blueprint-based agentic framework

Goedel-Architect is an agentic framework for formal theorem proving in Lean 4 that uses blueprint generation — a dependency graph of definitions and lemmas — rather than recursive decomposition, enabling parallel lemma closure and global refinement. Built on DeepSeek-V4-Flash (284B-A13B), it achieves 99.2% pass@1 on MiniF2F-test and 75.6% on PutnamBench, scaling to 100% on MiniF2F, 88.8% on PutnamBench, and 4/6 on IMO 2025 when seeded with natural-language proofs. The authors claim state-of-the-art performance for an open-source pipeline at up to 500x lower cost than comparable systems.

Frontier Model Releases Evaluation and Benchmarking MiniF2F DeepSeek-V4-Flash Goedel-Architect +3 more

4arXiv · cs.AI·Jul 15, 2026·source ↗

FormalAnalyticGeo: Neural-symbolic framework for automatic analytic geometry problem generation yields 7K-problem dataset

Researchers introduce FormalAnalyticGeo, a pipeline using four specialized LLM components and a formal intermediate language (CDL) to automatically generate multimodal analytic geometry problems with diagram rendering via a Signed Distance Field engine. The closed-loop system requires no human annotation and produces AnalyticGeo7K, a dataset of over 7,000 verified problems with aligned text, diagrams, formal annotations, and ground-truth answers. Generated problems achieve a median ground-truth relative error of 0.70%, with 82.3% of answers within 5% of exact symbolic solutions. The work addresses a recognized gap in math reasoning benchmarks where analytic geometry is underrepresented due to data scarcity.

Evaluation and Benchmarking Multimodal Progress FormalAnalyticGeo Condition Description Language AnalyticGeo7K

7Openai Blog·May 20, 2026·source ↗

OpenAI Neural Theorem Prover Solves Formal Math Olympiad Problems in Lean

OpenAI developed a neural theorem prover integrated with the Lean proof assistant that can solve challenging high-school olympiad problems, including problems from AMC12, AIME, and two IMO-adapted problems. The system demonstrates automated formal mathematical reasoning at a level previously requiring human expertise. This represents a significant capability milestone in AI-assisted formal verification and mathematical problem-solving.

Frontier Model Releases Evaluation and Benchmarking AIME Neural Theorem Prover OpenAI +3 more

5Hugging Face Blog·May 19, 2026·source ↗

Kimina-Prover-RL: Reinforcement Learning for Formal Mathematical Proving

Hugging Face blog post introduces Kimina-Prover-RL, a model trained with reinforcement learning targeting formal mathematical theorem proving. The post appears to describe a system from the AI-MO (AI for Math Olympiad) initiative. This represents a development in applying RL to formal proof generation, a competitive area involving Lean/Mathlib-style verification environments.

Evaluation and Benchmarking AI Safety Research Kimina-Prover-RL Hugging Face AI-MO +1 more

5arXiv · cs.AI·May 20, 2026·source ↗

AI-Assisted Theorem Proving in Lean 4: Aristotle API Case Study on IMO 2009 Problem 6

This paper presents a case study of using the Aristotle API for AI-assisted formal theorem proving in Lean 4, targeting the Grasshopper problem (IMO 2009 Problem 6). The generated artifact verifies four helper lemmas but leaves the main theorem unresolved via a 'sorry' placeholder, exposing a key limitation: local proof search can succeed while global combinatorial bookkeeping remains unsolved. The study provides a reproducible Lean artifact and precise analysis distinguishing verified from unverified proof content, offering a concrete benchmark for evaluating AI formalization capabilities.

Evaluation and Benchmarking Agent and Tool Ecosystem AI-assisted theorem proving Grasshopper Problem (IMO 2009 P6)Aristotle API +1 more

6arXiv · cs.CL·May 29, 2026·source ↗

COMPOSE: Dual-Graph Framework for Generating Future Mathematical Theorems from Citations and Formal Structure

COMPOSE is a framework that generates plausible future mathematical theorem-like claims by conditioning a language model on both a scientific citation graph and a formal theorem dependency graph simultaneously. The authors construct a dataset of 108K paired scientific-formal graph examples from arXiv and Mathlib, plus a benchmark of 47K future papers from 2024–2025. Experiments show COMPOSE outperforms baselines on retrieval to real future papers and LLM-judge evaluation, producing more grounded and mathematically richer outputs. The work advances AI-assisted mathematical reasoning by combining informal scientific context with formal proof structure.

Frontier Model Releases Evaluation and Benchmarking COMPOSE Mathlib grounded future mathematical generation +3 more