Entity · benchmark

AIME25

benchmarkactiveaime25-a490a9b4·2 events·first seen May 21, 2026

Aliases: AIME25

Co-occurring entities

AIME24 GRPO Qwen3-1.7B OPSD SGSD Skill-Conditioned Gated Self-Distillation (SGSD)HMMT25 RLVR ROUGE-L PPO Qwen3-4B GPQA Diamond Phi-4-mini LamPO MATH-500

More like this (12)

AIME26 AIME24 AIME AIME 2025 AIME2024 AI-MO AMIA AIME 2026 AIA Labs AIMS HMMT25 AIMO Progress Prize

Recent events (2)

6arXiv · cs.AI·May 28, 2026·source ↗

Skill-Conditioned Gated Self-Distillation (SGSD) for LLM Reasoning

SGSD is a new on-policy self-distillation method for LLM reasoning that replaces trusted privileged information (e.g., reference answers) with an experience-derived skill bank of skill-mistake pairs. It constructs a multi-teacher pool, validates each teacher's contribution via a verifier, and applies a gated objective to distill informative disagreements while suppressing noisy signals. On Qwen3-1.7B, SGSD outperforms GRPO by 6.2% and answer-conditioned OPSD by 1.7% on average across AIME24, AIME25, and HMMT25. The method relaxes the assumption of trusted privileged information, making self-distillation more practical under weaker supervision.

Frontier Model Releases Evaluation and Benchmarking OPSD AIME24 SGSD +7 more

5arXiv · cs.CL·May 21, 2026·source ↗

LamPO: Lambda-Style Policy Optimization with Pairwise Decomposed Advantage for Reasoning LMs

LamPO proposes a new RLVR training objective that replaces GRPO's scalar group-relative advantages with a Pairwise Decomposed Advantage, aggregating pairwise reward gaps within response groups and weighting comparisons by confidence-aware log-probability differences. The method retains a critic-free, clipped-update PPO-style structure and optionally adds a ROUGE-L-based dense auxiliary reward to reduce sparsity. Experiments on AIME24, AIME25, MATH-500, and GPQA-Diamond using Qwen3-1.7B, Qwen3-4B, and Phi-4-mini show consistent improvements over GRPO and other RLVR variants with more stable training dynamics.

Frontier Model Releases Evaluation and Benchmarking RLVR ROUGE-L AIME24 +10 more