benchmark
MATH-MCQA
benchmarkactiveprovisional
math-mcqa-c58b8f0b·1 events·first seen 37h agoAliases: MATH-MCQA
Co-occurring entities
More like this (12)
Recent events (1)
Uncertainty-Based Decontamination (UBD) framework for removing benchmark contamination from LLMs
Researchers propose Uncertainty-Based Decontamination (UBD), a method that uses deep ensembles of a contaminated model to estimate per-sample memorization and correct for benchmark data contamination without requiring access to an uncontaminated reference model. The approach introduces a sample-level evaluation framework using distributional distance metrics alongside aggregate accuracy to better characterize decontamination quality. Experiments on MMLU-Pro and MATH-MCQA show UBD produces output distributions closer to uncontaminated baselines than paraphrasing or choice-permutation methods. The work addresses a significant validity concern in LLM evaluation, where contamination inflates reported benchmark performance.