Almanac
company

Upwork

companyactiveupwork-d37b26c8·1 events·first seen 28d ago

Aliases: Upwork

Co-occurring entities

More like this (12)

Recent events (1)

7Openai Blog·28d ago·source ↗

Introducing the SWE-Lancer benchmark

OpenAI has released SWE-Lancer, a new benchmark that evaluates frontier LLMs on real-world freelance software engineering tasks sourced from Upwork, with a total payout value of $1 million. The benchmark tests whether models can complete tasks that human freelancers were paid to do, grounding evaluation in economic value rather than synthetic metrics. This positions SWE-Lancer as a practically-oriented complement to existing code benchmarks like SWE-bench.