benchmark

SWE-Bench Lite

benchmarkactiveprovisionalswe-bench-lite-5114ebc9·1 events·first seen 43h ago

Aliases: SWE-Bench Lite

Co-occurring entities

More like this (12)

SWE-bench SWE-Bench Verified SWE-Bench Multilingual SWE-Bench-Pro-Hard-AA SorryBench Claw-SWE-Bench MLE Bench Lite LiveBench SkillsBench CursorBench SupraBench SWE-Pro

Recent events (1)

6arXiv · cs.CL·43h ago·source ↗

SHERLOC: Training-free structured fault localization framework boosts code repair agent performance on SWE-Bench

SHERLOC is a training-free localization framework that pairs a reasoning LLM with compact repository tools to produce structured diagnostic context for code repair agents, rather than bare file pointers. It achieves 84.33% accuracy@1 on SWE-Bench Lite and 81.27% recall@1 on SWE-Bench Verified at ~30B parameters, matching or outperforming larger agentic methods. Injecting SHERLOC's diagnostic output into downstream repair agents yields an average +5.95 percentage point resolve rate improvement on SWE-Bench Verified while reducing localization tokens by 36.7% and total tokens by 23.1%. The work addresses a concrete inefficiency in agentic coding pipelines where roughly half the inference budget is spent on fault localization before any editing begins.

Evaluation and Benchmarking Agent and Tool Ecosystem SWE-Bench Lite SWE-Bench Verified SHERLOC