Entity · benchmark

SWE-Lancer

benchmarkactiveswe-lancer-7a438b8c·1 events·first seen May 20, 2026

Aliases: SWE-Lancer

Co-occurring entities

SWE-bench OpenAI Upwork

More like this (12)

FrontierSWE SWE-Agent SWE-bench XLSCOUT SWE-1.7 Open-SWE SWE-Pro SWE-fficiency SWE-Smith SWE-Explore Surfer-H SWE-Perf

Recent events (1)

7Openai Blog·May 20, 2026·source ↗

Introducing the SWE-Lancer benchmark

OpenAI has released SWE-Lancer, a new benchmark that evaluates frontier LLMs on real-world freelance software engineering tasks sourced from Upwork, with a total payout value of $1 million. The benchmark tests whether models can complete tasks that human freelancers were paid to do, grounding evaluation in economic value rather than synthetic metrics. This positions SWE-Lancer as a practically-oriented complement to existing code benchmarks like SWE-bench.

Frontier Model Releases Evaluation and Benchmarking SWE-bench OpenAI SWE-Lancer +2 more