Entity · benchmark

MLE-bench

benchmarkactivemle-bench-1b768286·2 events·first seen May 20, 2026

Aliases: MLE-bench

Co-occurring entities

MLEvolve Progressive MCGS AlphaEvolve InternScience Retrospective Memory Kaggle o1-preview OpenAI

More like this (12)

MT-Bench MSE-Bench MLE Bench Lite MMBench2 ClinMM-Bench LexNeo-Bench LiveBench EdgeBench LabBench ESI-Bench MaDI-Bench FoldBench

Recent events (2)

6arXiv · cs.CL·Jun 5, 2026·source ↗

MLEvolve: Self-evolving multi-agent framework for automated ML algorithm discovery

MLEvolve is a new LLM-based multi-agent framework for end-to-end machine learning algorithm discovery, addressing limitations of existing MLE agents including information isolation and memoryless search. The system introduces Progressive MCGS (a graph-extended tree search), Retrospective Memory for experience accumulation, and decoupled strategic planning from code generation. Evaluated on MLE-Bench, it achieves state-of-the-art medal and valid submission rates within a 12-hour budget, and also outperforms AlphaEvolve on mathematical algorithm optimization tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem MLEvolve MLE-bench Progressive MCGS +3 more

6Openai Blog·May 20, 2026·source ↗

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

OpenAI introduces MLE-bench, a benchmark designed to measure AI agent performance on machine learning engineering tasks. The benchmark draws from Kaggle competitions to evaluate agents on realistic ML engineering workflows. Initial results show that current agents, including those powered by o1-preview, achieve competitive performance on a subset of tasks but fall well short of top human competitors. The benchmark is intended to track progress in agentic ML capabilities over time.

Frontier Model Releases Evaluation and Benchmarking Kaggle o1-preview MLE-bench +2 more