Entity · benchmark

Artificial Analysis Coding Agent Index

benchmarkactiveartificial-analysis-coding-agent-index-d932c774·1 events·first seen Jun 12, 2026

Aliases: Artificial Analysis Coding Agent Index

Co-occurring entities

SWE-Bench-Pro-Hard-AA Claude Opus 4.6 SpaceX Cursor CursorBench Claude Code OpenAI Composer 2 Kimi K2.5 Moonshot AI Codex GPT-5.5 Anthropic

More like this (12)

Artificial Analysis Intelligence Index Artificial Analysis Artificial Analysis Intelligence Leaderboard Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents Artificial Analysis Text to Image coding agents OpenAI internal coding agents CodeAgents Artificial Analysis Conversational Dynamics Artificial Analysis Big Bench Audio Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?AI Agents

Recent events (1)

6The Batch·Jun 12, 2026·source ↗

Cursor's Composer 2.5 rivals GPT-5.5 and Claude Opus 4.7 on coding benchmarks at lower cost

Cursor released Composer 2.5, a specialized agentic coding model built on Moonshot's Kimi K2.5 open weights with additional pretraining and reinforcement learning fine-tuning tailored to Cursor's own CLI harness. The model ranks third on the Artificial Analysis Coding Agent Index behind Claude Opus 4.7 and GPT-5.5 at max reasoning, but significantly undercuts them on cost ($0.44 vs $4.14 per task) and speed (6.7 vs 17.7 minutes). The training approach—co-optimizing model and harness together using synthetic tasks, text feedback during RL, and 25x more synthetic data than Composer 2—illustrates a specialist model strategy that challenges the dominance of generalist frontier models in coding workflows.

Frontier Model Releases Inference Economics SWE-Bench-Pro-Hard-AA Claude Opus 4.6 SpaceX +12 more