Entity · benchmark

SWE-Marathon

benchmarkactiveswe-marathon-f820b6a0·1 events·first seen Jun 17, 2026

Aliases: SWE-Marathon

Co-occurring entities

DRACO FrontierSWE Anysphere OpenRouter Fusion Claude Opus 4.6 Munich Regional Court DeepSeek V4 Google SpaceX SWE-bench Cursor xAI Gemini 3 Flash IndexShare Kimi K2.6 Claude Code Zhipu AI OpenRouter OpenAI GLM-5.1

More like this (12)

SWE-Gym SWE-Pro SWE-Perf SWE-Explore Open-SWE SWE-1.7 SWE-Smith SWE-Interact SWE-Bench Lite SWE-Agent SWE-bench SWE-fficiency

Recent events (1)

7The Batch·Jun 17, 2026·source ↗

Data Points: GLM-5.2 leads open models on coding benchmarks; SpaceX acquires Cursor; OpenRouter Fusion; Anthropic coding study; ChatGPT market share drops

Zhipu released GLM-5.2, a 744B-parameter open model under MIT license that ranks second only to Claude Opus 4.8 on long-horizon coding benchmarks including FrontierSWE and SWE-Marathon, featuring a 1M-token context window and a 2.9× compute reduction via IndexShare attention. SpaceX is acquiring Cursor (Anysphere) for $60B in stock, positioning Musk's company to compete in AI software tools using xAI's Colossus infrastructure. OpenRouter launched Fusion, a multi-model synthesis tool showing that budget model panels can match frontier model performance at half the cost. An Anthropic study of 400K Claude Code sessions found domain expertise—not coding skill—is the primary driver of agentic output, while a Munich court ruled Google liable for false claims in AI Overviews.

Frontier Model Releases Evaluation and Benchmarking DRACO FrontierSWE Anysphere +24 more