Entity · benchmark

ReproRepo

benchmarkactivereprorepo-60b5f3e0·1 events·first seen Jun 17, 2026

Aliases: ReproRepo

Co-occurring entities

OpenAI Codex GPT-5.5

More like this (12)

RePro REPOCOD NL2Repo-Bench Repomix RepoPeftBench RepoBench RECOM ReaORE Replit OneRec Apart Research RECAP

Recent events (1)

5arXiv · cs.LG·Jun 17, 2026·source ↗

ReproRepo: Scalable LLM agent framework for reproducibility auditing using GitHub issues

ReproRepo is a new framework for evaluating LLM agents on reproducibility auditing of ML research, using naturally occurring GitHub issues as supervision signals rather than costly manual curation. The framework is instantiated on 1,149 recent ML papers from major conferences and benchmarks four frontier model-agent configurations. The best-performing agent (Codex with GPT-5.5) surfaces at least one semantically related human-reported reproduction blocker for ~90% of papers, though exact localization of issues remains a weakness. The work provides a reusable, scalable evaluation harness for this underexplored agentic task.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI ReproRepo Codex +1 more