benchmark
SWE-Explore
benchmarkactiveprovisional
swe-explore-074f8545·1 events·first seen 9d agoAliases: SWE-Explore
Co-occurring entities
More like this (12)
Recent events (1)
SWE-Explore: New benchmark isolates repository exploration capability in coding agents
SWE-Explore is a new benchmark targeting repository exploration as a distinct, fine-grained capability of coding agents, separate from end-to-end task resolution. It covers 848 issues across 10 programming languages and 203 open-source repositories, with line-level ground truth derived from successful agent trajectories. Evaluation across retrieval methods, coding agents, and specialized localizers finds that agentic explorers outperform classical retrieval, and that line-level coverage and efficient ranking remain the key differentiators at the frontier. The benchmark addresses a gap in SWE-bench-style evaluations that treat task resolution as a binary outcome.