Almanac
benchmark

SWE-Explore

benchmarkactiveprovisionalswe-explore-074f8545·1 events·first seen 9d ago

Aliases: SWE-Explore

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·9d ago·source ↗

SWE-Explore: New benchmark isolates repository exploration capability in coding agents

SWE-Explore is a new benchmark targeting repository exploration as a distinct, fine-grained capability of coding agents, separate from end-to-end task resolution. It covers 848 issues across 10 programming languages and 203 open-source repositories, with line-level ground truth derived from successful agent trajectories. Evaluation across retrieval methods, coding agents, and specialized localizers finds that agentic explorers outperform classical retrieval, and that line-level coverage and efficient ranking remain the key differentiators at the frontier. The benchmark addresses a gap in SWE-bench-style evaluations that treat task resolution as a binary outcome.