
Cursor
cursor-30a51c39·18 events·first seen 1mo agoAliases: Cursor, Cursor 3
Co-occurring entities
More like this (12)
Recent events (18)
Cursor's Composer 2.5 rivals GPT-5.5 and Claude Opus 4.7 on coding benchmarks at lower cost
Cursor released Composer 2.5, a specialized agentic coding model built on Moonshot's Kimi K2.5 open weights with additional pretraining and reinforcement learning fine-tuning tailored to Cursor's own CLI harness. The model ranks third on the Artificial Analysis Coding Agent Index behind Claude Opus 4.7 and GPT-5.5 at max reasoning, but significantly undercuts them on cost ($0.44 vs $4.14 per task) and speed (6.7 vs 17.7 minutes). The training approach—co-optimizing model and harness together using synthetic tasks, text feedback during RL, and 25x more synthetic data than Composer 2—illustrates a specialist model strategy that challenges the dominance of generalist frontier models in coding workflows.
How Cursor Uses GPT-5
OpenAI published a brief on how Cursor, the AI-powered code editor, integrates GPT-5 into its development workflow. The post highlights a real-world enterprise deployment of GPT-5 in a coding assistant context. This represents a notable use case demonstrating GPT-5's practical adoption in developer tooling.
Data Points: Cursor Composer 2.5, Gemini 3.5 Flash, Antigravity 2.0, Omni Flash, AI Search, and Corti Symphony
This edition covers several notable AI product and model releases: Cursor shipped Composer 2.5 (built on Kimi K2.5) scoring 79.8% on SWE-Bench Multilingual at significantly lower cost than frontier competitors; Google released Gemini 3.5 Flash with claimed 4x speed advantage and launched Antigravity 2.0 as an agent-first desktop app replacing its IDE; Google also introduced Gemini Omni Flash for multimodal video generation and overhauled its search interface with Gemini 3.5. Additionally, Copenhagen-based Corti launched Symphony for Speech-to-Text achieving 1.4% word error rate on medical terminology versus 17-19% for generalist models.
MiniMax M2.7 proprietary reasoning model competes with Gemini and Claude Opus; roundup covers Cursor Composer 2, MAI-Image-2, Claude Code Channels, and Anthropic defense dispute
MiniMax released M2.7, a proprietary reasoning model that achieved 66.6% on MLE Bench Lite (tying Gemini 3.1) and 56.22% on SWE-Pro, priced at $0.30/$1.20 per million tokens, with the shift to proprietary marking a potential strategic pivot among Chinese AI labs away from open weights. Cursor released Composer 2, an agentic coding model built on a fine-tuned Kimi 2.5 (via Moonshot partnership), priced 86% cheaper than its predecessor and scoring 73.7 on SWE-bench Multilingual. Anthropic released Claude Code Channels, routing Telegram and Discord messages into local Claude Code sessions via MCP plugins, and separately filed a court response denying it has any backdoor or kill switch into military deployments of Claude. Microsoft announced MAI-Image-2, a text-to-image model ranking third on Arena.ai among research labs.
CodeGraph: Pre-indexed Local Code Knowledge Graph for AI Coding Agents
CodeGraph is an open-source TypeScript tool that builds a pre-indexed knowledge graph of a codebase to reduce token usage and tool calls for AI coding agents including Claude Code, Codex, Cursor, OpenCode, and Hermes Agent. It runs entirely locally, positioning itself as an efficiency layer between codebases and LLM-based coding assistants. The project gained significant traction with 3,688 stars in a single day, reaching 16,371 total stars.
Empirical study finds 80% of AI agent-authored test patches lack meaningful verification logic
A large-scale empirical study of 86,156 test-file patches from 33,596 agent-authored GitHub PRs finds that 80.2% contain weak or no explicit oracle signals — meaning they execute code without verifying behavior. The study covers five coding agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code) across 2,807 repositories, and introduces a syntactic taxonomy of eight oracle signal categories. Despite lower raw merge rates, regression analysis shows strong oracles significantly improve merge likelihood (OR=1.28), suggesting current quality gates based on test-file presence substantially overestimate verification strength.
claude-skills: 313+ Skill/Plugin Collection for Claude Code and Multi-Agent Coding Tools
A GitHub repository providing 313+ reusable skills, agent plugins, and workflow templates targeting Claude Code, Codex, Gemini CLI, Cursor, and eight other coding agents. Coverage spans engineering, marketing, compliance, C-level advisory, finance, and productivity domains. The project has accumulated 15,476 stars with 157 added today, indicating strong community traction. It represents a growing ecosystem of structured prompt/skill libraries designed to extend AI coding agents beyond pure code generation.
agent-skills: Secure Validated Skill Registry for AI Coding Agents
A TypeScript-based open-source skill registry designed to extend AI coding agents including Claude Code, Cursor, GitHub Copilot, and Antigravity with validated, reusable capabilities. The project provides a structured way to add skills to multiple coding agent platforms with a focus on security and validation. It is gaining notable traction with 3,767 total stars and 225 stars added today.
SkillKit: Portable Skills Layer for AI Coding Agents
SkillKit is an open-source TypeScript project that provides a portable skills abstraction for AI coding agents, enabling installation, translation, and sharing of skills across tools like Claude Code, Cursor, Codex, GitHub Copilot, and 40+ others. The project has accumulated 1,112 stars with 32 added today, indicating moderate community traction. It targets the interoperability gap between the growing ecosystem of AI coding assistants.
Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models
This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.
Claude 3.7 Sonnet and Claude Code: Anthropic's First Hybrid Reasoning Model and Agentic Coding Tool
Anthropic has released Claude 3.7 Sonnet, described as their most capable model to date and the first hybrid reasoning model on the market, capable of operating in both standard and extended thinking modes within a single unified model. The model achieves state-of-the-art results on SWE-bench Verified and TAU-bench, with particular strength in coding and front-end web development. Alongside the model, Anthropic is launching Claude Code in limited research preview, a command-line agentic coding tool that can read/edit files, run tests, and push to GitHub. Pricing remains unchanged at $3/M input and $15/M output tokens, with availability across Claude.ai plans, Amazon Bedrock, and Google Cloud Vertex AI.
Anthropic Introduces Claude Opus 4 and Sonnet 4 with Leading Coding Benchmarks and Agent Capabilities
Anthropic has released Claude Opus 4 and Claude Sonnet 4, positioning Opus 4 as the world's best coding model with 72.5% on SWE-bench and 43.2% on Terminal-bench, and Sonnet 4 at 72.7% on SWE-bench. Both models are hybrid (near-instant + extended thinking), support extended thinking with tool use in beta, parallel tool execution, and improved memory via local file access. Alongside the models, Anthropic is launching Claude Code as generally available with GitHub Actions, VS Code, and JetBrains integrations, plus four new API capabilities: code execution tool, MCP connector, Files API, and one-hour prompt caching. Pricing is unchanged from prior Opus and Sonnet tiers ($15/$75 and $3/$15 per million tokens respectively), with availability on Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI.
Anthropic Releases Claude Opus 4.5 with State-of-the-Art Coding, Agent, and Computer Use Capabilities
Anthropic has released Claude Opus 4.5, positioning it as the best model in the world for coding, agentic workflows, and computer use, with pricing reduced to $5/$25 per million input/output tokens. The model demonstrates significant token efficiency gains—up to 65% fewer tokens than prior models on equivalent tasks—alongside improvements in long-horizon autonomous task execution, multi-step reasoning, and self-improving agent behavior. The release is accompanied by updates to Claude Code, the Claude Developer Platform, and integrations with Excel, Chrome, and desktop environments. Early partner feedback from GitHub Copilot, Cursor, Notion, Warp, and others reports measurable benchmark improvements and new use cases previously out of reach.
Anthropic Releases Claude Sonnet 4.5: Top Coding and Computer-Use Model with Agent SDK
Anthropic has released Claude Sonnet 4.5, claiming it is the best coding model and strongest model for building complex agents, with a 61.4% score on OSWorld (up from 42.2% for Sonnet 4) and state-of-the-art performance on SWE-bench Verified. The release is accompanied by major product upgrades including checkpoints in Claude Code, a native VS Code extension, a Claude Agent SDK giving developers access to the same infrastructure powering Claude Code, and new context editing and memory tools in the Claude API. Pricing is unchanged from Sonnet 4 at $3/$15 per million input/output tokens. Early enterprise customers including Cursor, GitHub Copilot, Devin, Canva, and Figma report significant gains in coding, agentic, and long-context tasks.
Google Labs Releases Stitch Skills: Agent Skills Library for MCP-Compatible Coding Agents
Google Labs has published stitch-skills, a TypeScript library of Agent Skills designed to work with the Stitch MCP server. The library follows the Agent Skills open standard, enabling compatibility with multiple coding agents including Gemini CLI, Claude Code, Cursor, and Antigravity. The repository has accumulated 5,597 stars with 70 added today, indicating active community interest in the MCP/agent tooling ecosystem.
Anthropic-Cybersecurity-Skills: 754 Structured Cybersecurity Skills for AI Agents
A GitHub repository providing 754 structured cybersecurity skills designed for AI coding agents, mapped to five major frameworks including MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, and NIST AI RMF. The skills are organized across 26 security domains and conform to the agentskills.io standard. The project claims compatibility with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI, and 20+ other platforms. It has accumulated 7,330 stars with 238 added today, indicating notable community traction.
Anthropic raises Series E at $61.5B post-money valuation
Anthropic has closed a $3.5 billion Series E round at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners with participation from Bessemer, Cisco, Fidelity, General Catalyst, Salesforce Ventures, and others. Proceeds will fund next-generation AI system development, expanded compute capacity, mechanistic interpretability and alignment research, and international expansion. The raise follows the launch of Claude 3.7 Sonnet and Claude Code, with Anthropic citing strong enterprise adoption across customers including Cursor, Zoom, Snowflake, Pfizer, and Amazon's Alexa+.
Understand-Anything: interactive knowledge graph tool for code exploration using AI assistants
Egonex-AI has released Understand-Anything, a TypeScript tool that converts codebases into interactive knowledge graphs that can be explored, searched, and queried. The tool integrates with multiple AI coding assistants including Claude Code, Codex, Cursor, GitHub Copilot, and Gemini CLI. It has accumulated 62,256 GitHub stars with 1,146 added today, indicating strong community traction.