Entity · benchmark

MATH500

benchmarkactivemath500-c134f926·2 events·first seen May 29, 2026

Aliases: MATH500

Co-occurring entities

HumanEval LLaDA-8B-Base EB-Sampler ADAS MBPP Dream-7B-Base GSM8K Fast-dLLM power distribution Entropy-Cut Metropolis-Hastings GPQA Diamond AIME26 Metropolis-Hastings

More like this (12)

MATH-500 MATH-MCQA Big-Math MATH DAPO-Math Mathlib AdvancedMathBench Illustrative Mathematics Axiom Math CS336 MATH benchmark MathVista

Recent events (2)

5arXiv · cs.CL·Jun 10, 2026·source ↗

ADAS: Attention-Discounted Adaptive Sampler improves parallel decoding for masked diffusion language models

Researchers propose ADAS, a training-free reranking rule for masked diffusion language model decoding that addresses token interaction failures in parallel token commitment. The method greedily penalizes candidates that attend strongly to already-selected uncertain positions, using attention weights as soft marginal penalties rather than hard constraints. Evaluated on LLaDA-8B-Base and Dream-7B-Base across GSM8K, MATH500, HumanEval, and MBPP, ADAS improves low-NFE performance by 9–10 percentage points on average when plugged into existing samplers with only 3.1% runtime overhead.

Frontier Model Releases Inference Economics LLaDA-8B-Base MATH500 EB-Sampler +6 more

7arXiv · cs.LG·May 29, 2026·source ↗

Entropy-Cut Metropolis-Hastings: Sampling-Based Reasoning Without RL Training

This paper introduces Entropy-Cut Metropolis-Hastings (ECMH), an algorithm that samples from a 'power distribution' over base language model outputs to elicit strong reasoning without reinforcement learning posttraining. Rather than cutting reasoning traces at uniformly random positions, ECMH uses next-token entropy as a proxy to identify consequential decision points (e.g., choice of proof strategy), then resamples from those positions. The authors prove that mixing time scales with the number of decisions rather than tokens, and demonstrate consistent improvements over RL-trained models on MATH500, HumanEval, GPQA Diamond, and AIME26.

Frontier Model Releases Evaluation and Benchmarking power distribution MATH500 Entropy-Cut Metropolis-Hastings +6 more