Almanac
benchmark

IMO 2025

benchmarkactiveimo-2025-a96f25b2·2 events·first seen 1mo ago

Aliases: IMO 2025

Co-occurring entities

More like this (12)

Recent events (2)

9arXiv · cs.CL·5d ago·source ↗

MaxProof achieves gold-medal-level performance on IMO 2025 and USAMO 2026 via population-level test-time scaling

MiniMax introduces MaxProof, a test-time scaling framework for competition-level mathematical proof built on their MiniMax-M3 model. The system trains three capabilities — proof generation, verification, and critique-conditioned repair — then at inference time runs tournament selection over a population of candidate proofs. MaxProof scores 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both competitions.

8Deepseek News·1mo ago·source ↗

DeepSeek-V3.2 and V3.2-Speciale Released: Reasoning-First Models with Agent Tool-Use Integration

DeepSeek has released two new open-weights models: DeepSeek-V3.2, the official successor to V3.2-Exp with balanced reasoning and tool-use capabilities, and DeepSeek-V3.2-Speciale, a maxed-out reasoning variant claiming gold-medal performance on IMO, CMO, ICPC World Finals, and IOI 2025. V3.2 is the first DeepSeek model to integrate chain-of-thought thinking directly into tool-use workflows, trained on a new agent data synthesis pipeline covering 1,800+ environments and 85k+ complex instructions. V3.2-Speciale is API-only with no tool-call support, available via a temporary endpoint expiring December 15, 2025, while both models are open-sourced on Hugging Face with an accompanying technical report.