Entity · product

Lean

productactivelean-8bd714e1·7 events·first seen May 19, 2026

Aliases: Lean

Co-occurring entities

More like this (12)

Lean 4 Leanstral lean-lsp-mcp Linear ALIGN Slack INSHAPE Elastic Weight Consolidation FMRP-LEAN MLE-bench Merge Labs LIME

Recent events (7)

6arXiv · cs.LG·4d ago·source ↗

CausalForge: Agentic framework for automated causal inference research grounded in Lean formal proofs

CausalForge is a new agentic research framework that automates theoretical research in causal inference by grounding outputs in the Lean proof assistant rather than relying on LLM reviewers, which the authors note accept fabricated papers at near-chance rates. The system combines Causalean, a 7,035-declaration Lean library for causal inference built with LLM assistance under human oversight, with CausalSmith, a self-improving pipeline that selects topics, proposes results, formalizes statements, and constructs machine-checked proofs. A statement audit step bridges the gap between formal correctness and scientific validity by comparing each theorem against its informal intended claim. The framework represents a notable approach to closing the automated research loop with formal verification rather than LLM judgment.

Evaluation and Benchmarking AI Safety Research CausalForge Bad Scientist CausalSmith +4 more

6arXiv · cs.AI·Jul 16, 2026·source ↗

Generative Compilation: Real-time compiler feedback during LLM code generation via sealors

Researchers introduce generative compilation, a technique that provides compiler feedback on partial programs during autoregressive LLM decoding rather than only after generation completes. The core mechanism is a 'sealor'—a syntax-guided transformation that converts partial programs into complete ones that standard compilers can analyze, formally verified in Lean. Evaluated on repository-level Rust coding tasks across frontier black-box and open-weight models, the approach reduces non-compiling outputs and improves functional correctness by catching errors early and preventing error cascades. The method works without white-box model access, distinguishing it from constrained decoding approaches.

Evaluation and Benchmarking Agent and Tool Ecosystem Generative Compilation: On-the-Fly Compiler Feedback as AI Generates Code Rust Lean

4arXiv · cs.CL·Jun 29, 2026·source ↗

Signal-Coverage Matrix proposes finer-grained evaluation of LLM autoformalization errors

A new arXiv preprint introduces the signal-coverage matrix, a 2×2 framework that crosses Lean elaborator pass/fail with semantic-equivalence judgments to decompose autoformalization errors into four distinct cells rather than a single type-correctness scalar. The authors evaluate four methods (Vanilla, Lean-Retry, Sample-Filter, and Stratified Autoformalization) on ProofNet# and MiniF2F-test using DeepSeek V4-Pro, finding that headline TC% gains mask flat semantic-only error recovery and that symbolic and LLM judges diverge by 26–37 percentage points on elaborator-feedback outputs. The work argues that TC% improvements should be credited by which error cell moved, not by the aggregate scalar alone.

Evaluation and Benchmarking DeepSeek V4 MiniF2F Lean +2 more

4arXiv · cs.CL·Jun 16, 2026·source ↗

Informath: Symbolic informalization for converting formal proofs to fluent natural language

The paper introduces Informath, a project for symbolic informalization — converting formally verified mathematics into readable natural language without loss of precision. The architecture uses Dedukti as an interlingua hub connecting proof systems (Agda, Lean, Rocq) and Grammatical Framework (GF) for multilingual natural language generation. The work is relevant to AI-assisted formal verification pipelines where autoformalization produces machine-checked proofs that need to be made human-interpretable.

Informath Grammatical Framework Dedukti +3 more

8arXiv · cs.AI·May 22, 2026·source ↗

Large-Scale Evaluation of LLM-Driven Formal Proof Search on Open Mathematical Problems

Researchers present the first large-scale evaluation of LLM-based formal proof search on genuinely open mathematical problems, using Lean as a verification backend. Their most capable agent autonomously resolved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures, at a cost of a few hundred dollars per problem. The system is already being deployed in active research across combinatorics, optimization, graph theory, algebraic geometry, and quantum optics. The study also compares agent architectures, finding that more sophisticated designs outperform simple generate-and-verify loops on the hardest problems.

Frontier Model Releases Evaluation and Benchmarking large language models Erdős Problems OEIS Conjectures +3 more

7Openai Blog·May 20, 2026·source ↗

OpenAI Neural Theorem Prover Solves Formal Math Olympiad Problems in Lean

OpenAI developed a neural theorem prover integrated with the Lean proof assistant that can solve challenging high-school olympiad problems, including problems from AMC12, AIME, and two IMO-adapted problems. The system demonstrates automated formal mathematical reasoning at a level previously requiring human expertise. This represents a significant capability milestone in AI-assisted formal verification and mathematical problem-solving.

Frontier Model Releases Evaluation and Benchmarking AIME Neural Theorem Prover OpenAI +3 more

6Hugging Face Blog·May 19, 2026·source ↗

Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models

Kimina-Prover is a new large formal reasoning model that combines reinforcement learning with test-time search to improve mathematical theorem proving. The approach applies RL-trained search strategies at inference time, targeting formal proof generation in systems like Lean. The work is published via the AI-MO (AI for Math Olympiad) team on Hugging Face, continuing the trend of applying RL and extended compute at test time to hard reasoning tasks.

Frontier Model Releases Evaluation and Benchmarking Kimina-Prover-RL Hugging Face AI-MO +4 more