Almanac
benchmark

USAMO 2026

benchmarkactiveprovisionalusamo-2026-0e448be7·1 events·first seen 4d ago

Aliases: USAMO 2026

Co-occurring entities

More like this (12)

Recent events (1)

9arXiv · cs.CL·4d ago·source ↗

MaxProof achieves gold-medal-level performance on IMO 2025 and USAMO 2026 via population-level test-time scaling

MiniMax introduces MaxProof, a test-time scaling framework for competition-level mathematical proof built on their MiniMax-M3 model. The system trains three capabilities — proof generation, verification, and critique-conditioned repair — then at inference time runs tournament selection over a population of candidate proofs. MaxProof scores 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both competitions.