Entity · product

Lean 4

productactivelean-4-5763fa07·5 events·first seen May 18, 2026

Aliases: Lean 4

Co-occurring entities

More like this (12)

Lean Leanstral lean-lsp-mcp Grok-4-Fast FMRP-LEAN Phi-4-mini Inter4K BLEU-4 Llama-4-Maverick Light-Heart-Labs GPT-4.1 Gemma 4

Recent events (5)

7Mistral Ai News·Jul 3, 2026·source ↗

Mistral releases Leanstral 1.5, a state-of-the-art open-source formal verification model

Mistral AI released Leanstral 1.5, an Apache-2.0 licensed mixture-of-experts model with 119B total and 6B active parameters, specialized for formal verification in Lean 4. The model saturates miniF2F (100%), solves 587/672 PutnamBench problems, and achieves new state-of-the-art results on FATE-H (87%) and FATE-X (34%), while costing roughly $4 per problem versus ~$300 for comparable systems. Trained via mid-training, supervised fine-tuning, and reinforcement learning with CISPO, it demonstrates strong test-time scaling and practical code verification capabilities, uncovering 5 previously unknown bugs across 57 open-source repositories.

Frontier Model Releases Evaluation and Benchmarking Mistral AI FATE-H Claude Opus 4.6 +10 more

8arXiv · cs.AI·Jun 5, 2026·source ↗

Goedel-Architect achieves state-of-the-art formal theorem proving with blueprint-based agentic framework

Goedel-Architect is an agentic framework for formal theorem proving in Lean 4 that uses blueprint generation — a dependency graph of definitions and lemmas — rather than recursive decomposition, enabling parallel lemma closure and global refinement. Built on DeepSeek-V4-Flash (284B-A13B), it achieves 99.2% pass@1 on MiniF2F-test and 75.6% on PutnamBench, scaling to 100% on MiniF2F, 88.8% on PutnamBench, and 4/6 on IMO 2025 when seeded with natural-language proofs. The authors claim state-of-the-art performance for an open-source pipeline at up to 500x lower cost than comparable systems.

Frontier Model Releases Evaluation and Benchmarking MiniF2F DeepSeek-V4-Flash Goedel-Architect +3 more

7arXiv · cs.AI·May 25, 2026·source ↗

Agentic Proving for Program Verification: Claude Code Achieves 98.1% on CLEVER Benchmark

Researchers evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation, achieving 98.1% end-to-end success on program generation and verification over self-consistent entries. The system generates valid specifications for 98.8% of problems and certifies implementations against ground-truth specifications for 87.5% of problems. The results reveal a growing mismatch between existing program verification benchmark difficulty and modern agentic prover capabilities, motivating calls for more rigorous evaluation methodologies. The findings support compiler-in-the-loop agentic paradigms as the current state-of-the-art for foundational program verification.

Evaluation and Benchmarking AI Safety Research CLEVER isomorphism-based scoring agentic proving +4 more

5arXiv · cs.AI·May 20, 2026·source ↗

AI-Assisted Theorem Proving in Lean 4: Aristotle API Case Study on IMO 2009 Problem 6

This paper presents a case study of using the Aristotle API for AI-assisted formal theorem proving in Lean 4, targeting the Grasshopper problem (IMO 2009 Problem 6). The generated artifact verifies four helper lemmas but leaves the main theorem unresolved via a 'sorry' placeholder, exposing a key limitation: local proof search can succeed while global combinatorial bookkeeping remains unsolved. The study provides a reproducible Lean artifact and precise analysis distinguishing verified from unverified proof content, offering a concrete benchmark for evaluating AI formalization capabilities.

Evaluation and Benchmarking Agent and Tool Ecosystem AI-assisted theorem proving Grasshopper Problem (IMO 2009 P6)Aristotle API +1 more

7Mistral Ai News·May 18, 2026·source ↗

Mistral Releases Leanstral: First Open-Source Code Agent for Lean 4 Formal Verification

Mistral AI has released Leanstral, an open-source code agent built on a sparse 120B/6B-active-parameter architecture, designed specifically for formal proof engineering in Lean 4. The model targets realistic proof engineering workflows rather than isolated math competition problems, and is benchmarked on FLTEval, a new evaluation suite tied to the Fermat's Last Theorem formalization project. Leanstral is released under Apache 2.0 with a free API endpoint and MCP support, and demonstrates competitive performance against Claude Sonnet 4.6 at roughly 1/15th the cost. The release positions formal verification as a scalable alternative to human code review for high-stakes software and mathematics.

Evaluation and Benchmarking Open Weights Progress Mistral AI Claude Sonnet 4 Claude Opus 4.6 +11 more