Entity · product

Devin

productactivedevin-63d8e29c·9 events·first seen May 18, 2026

Aliases: Devin

Co-occurring entities

More like this (12)

Devin Fusion Adam David Chen Ethan Mollick Alyah Aditi Krishnapriyan Jared Kaplan Jay Shim Prodigy AdamW Daniel Rock Deloitte

Recent events (9)

9Hacker News·5d ago·source ↗

Anthropic releases Claude Opus 5

Anthropic has announced Claude Opus 5, a new flagship model release. The item originates from Anthropic's official news domain, indicating a primary source announcement. This would represent a significant step beyond the current Claude Opus 4.8 flagship and is likely to be a major frontier model release.

Frontier Model Releases Inference Economics Zapier Claude Max Claude Opus 4.6 +15 more

6The Batch·Jul 16, 2026·source ↗

Data Points: PrismML fits 27B model on iPhone; Cognition SWE-1.7, Nvidia Audex, Anthropic language-value study

A newsletter digest covers four notable AI developments: PrismML (a Caltech/Khosla spinout) compressed Alibaba's Qwen 27B model to under 4 GB via ternary/binary quantization for on-device iPhone inference; Cognition released SWE-1.7 (trained on Kimi K2.7), jumping from 9.4% to 42.3% on FrontierCode 1.1 Main with novel RL and infrastructure techniques; Nvidia introduced Audex, a 30B unified audio-text transformer trained on 157B audio tokens; and Anthropic published research showing Claude's expressed values shift measurably by language across 309,815 conversations. Each item represents a distinct technical development across on-device inference, coding agents, multimodal models, and model behavior analysis.

Inference Economics Agent and Tool Ecosystem Kimi K2 Claude Sonnet Claude Opus 4.6 +18 more

6arXiv · cs.AI·Jun 17, 2026·source ↗

Empirical study finds 80% of AI agent-authored test patches lack meaningful verification logic

A large-scale empirical study of 86,156 test-file patches from 33,596 agent-authored GitHub PRs finds that 80.2% contain weak or no explicit oracle signals — meaning they execute code without verifying behavior. The study covers five coding agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code) across 2,807 repositories, and introduces a syntactic taxonomy of eight oracle signal categories. Despite lower raw merge rates, regression analysis shows strong oracles significantly improve merge likelihood (OR=1.28), suggesting current quality gates based on test-file presence substantially overestimate verification strength.

Evaluation and Benchmarking Agent and Tool Ecosystem GitHub Devin Cursor +4 more

9Anthropic News·Jun 1, 2026·source ↗

Anthropic Releases Claude Sonnet 4.5: Top Coding and Computer-Use Model with Agent SDK

Anthropic has released Claude Sonnet 4.5, claiming it is the best coding model and strongest model for building complex agents, with a 61.4% score on OSWorld (up from 42.2% for Sonnet 4) and state-of-the-art performance on SWE-bench Verified. The release is accompanied by major product upgrades including checkpoints in Claude Code, a native VS Code extension, a Claude Agent SDK giving developers access to the same infrastructure powering Claude Code, and new context editing and memory tools in the Claude API. Pricing is unchanged from Sonnet 4 at $3/$15 per million input/output tokens. Early enterprise customers including Cursor, GitHub Copilot, Devin, Canva, and Figma report significant gains in coding, agentic, and long-context tasks.

Frontier Model Releases Evaluation and Benchmarking Canva Claude for Chrome Figma +13 more

5Latent Space·May 28, 2026·source ↗

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

A Latent Space podcast episode featuring Cognition's Walden Yan and OpenInspect's Cole Murray discussing the current state of autonomous software engineering agents. Topics include Devin's reported 80% commit rate, spec-to-PR workflows, full VM environments for agents, agent memory, and the emerging pattern of product managers shipping code directly. The conversation covers practical deployment patterns and tooling for async agentic coding workflows.

Frontier Model Releases Enterprise Deployment Patterns Devin Cole Murray Cognition +4 more

8Hacker News·May 28, 2026·source ↗

Claude Opus 4.8 Released by Anthropic

Anthropic has released Claude Opus 4.8, a new frontier model in their Claude lineup. The announcement appeared on Anthropic's official news page and generated significant community engagement on Hacker News with over 1,000 points and 800+ comments. Specific capability details and benchmarks are not available from the source snippet alone.

Frontier Model Releases Evaluation and Benchmarking claude.ai Claude Opus 4.6 Databricks +16 more

7Latent Space·May 28, 2026·source ↗

Cognition raises $1B in $26B Series D

Cognition, the AI coding agent company behind Devin, has raised $1B in a Series D round at a $26B valuation. The round signals continued investor conviction in autonomous coding agents as a large and growing market. The Latent Space newsletter frames coding as an 'uncapped TAM market,' reflecting broader industry sentiment around AI-driven software development.

Enterprise Deployment Patterns Agent and Tool Ecosystem Devin Cognition Latent Space

4Openai Blog·May 20, 2026·source ↗

Coding with OpenAI o1

OpenAI published a brief feature in which Scott Wu, CEO of Cognition (maker of the Devin AI software engineer), describes how o1 approaches coding decisions in a more human-like, reasoning-oriented manner. The piece is a short promotional commentary tied to the o1 model launch, highlighting o1's potential impact on AI-assisted software development. No new technical benchmarks or capability details are disclosed.

Frontier Model Releases Agent and Tool Ecosystem Scott Wu Devin Cognition +1 more

8Anthropic News·May 18, 2026·source ↗

Anthropic Releases Claude Opus 4.7 with Enhanced Coding, Vision, and Cyber Safeguards

Anthropic has released Claude Opus 4.7, a general-availability model positioned as a meaningful improvement over Opus 4.6 in advanced software engineering, long-horizon agentic tasks, and vision capabilities including higher image resolution. The model is notably the first to receive new cybersecurity safeguards developed in response to Project Glasswing, with automatic detection and blocking of prohibited cyber uses and a new Cyber Verification Program for legitimate security professionals. Opus 4.7 is available across Claude products, API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same pricing as Opus 4.6 ($5/$25 per million input/output tokens). The release is explicitly positioned below Claude Mythos Preview in overall capability, serving as a testbed for safety mechanisms before broader deployment of Mythos-class models.

Frontier Model Releases Evaluation and Benchmarking Harvey Solve Intelligence Amazon Bedrock +16 more