product

Codex

productactivecodex-6cf43bae·58 events·first seen 1mo ago

Aliases: Codex, codex-1

Co-occurring entities

More like this (12)

Codex 5.3 Codex CLI Codex SDK Codex Security Codex Labs OpenAI Codex Codex App Server GPT-5.3-Codex Codex HumanEval Codeium GPT-5.1-Codex-Max Colossus 1

Guides (1)

Codex

Codex: OpenAI's AI Coding Agent

Read asBeginner In-depth

Recent events (50)

7Openai Blog·1mo ago·source ↗

Scaling Codex to enterprises worldwide

OpenAI is launching Codex Labs and forming partnerships with major consulting and IT firms including Accenture, PwC, and Infosys to accelerate enterprise adoption of Codex across the software development lifecycle. The announcement reports 4 million weekly active users for Codex. This represents a significant push to embed OpenAI's coding AI into large-scale enterprise workflows through established system integrators.

Frontier Model Releases Enterprise Deployment Patterns PwC Infosys Accenture +4 more

6Openai Blog·1mo ago·source ↗

Codex for (almost) everything: OpenAI expands Codex app with computer use, browsing, image generation, memory, and plugins

OpenAI has updated its Codex desktop application for macOS and Windows with a broad set of new capabilities including computer use, in-app browsing, image generation, persistent memory, and plugin support. The update positions Codex as a more comprehensive agentic developer tool rather than a pure code-completion assistant. These additions bring Codex closer to a general-purpose AI agent environment targeting developer workflows.

Enterprise Deployment Patterns Agent and Tool Ecosystem plugins OpenAI memory +3 more

6Openai Blog·1mo ago·source ↗

Introducing upgrades to Codex

OpenAI has announced upgrades to Codex, its AI coding agent, improving speed, reliability, and real-time collaboration capabilities. The updates extend Codex's reach across multiple development environments including terminal, IDE, web, and mobile. The announcement emphasizes both interactive collaboration and autonomous task execution.

Frontier Model Releases Agent and Tool Ecosystem OpenAI Codex

8Openai Blog·1mo ago·source ↗

Introducing Codex

OpenAI has announced Codex, a new product or capability targeting software development and coding tasks. The announcement comes from OpenAI's official blog, suggesting a significant product or model release. The body content was not provided, but given the Codex name and OpenAI's history, this likely involves an AI-powered coding agent or updated code generation system. Further details on capabilities, pricing, and availability are expected in the full announcement.

Frontier Model Releases Enterprise Deployment Patterns OpenAI Codex +1 more

8Openai Blog·1mo ago·source ↗

Evaluating Large Language Models Trained on Code

OpenAI published research on evaluating large language models trained on code, introducing the Codex model and the HumanEval benchmark for assessing code generation capabilities. The work established foundational methodology for measuring functional correctness of code produced by LLMs using a pass@k metric. This paper became a landmark reference for code-focused LLM evaluation and influenced subsequent code generation research across the field.

Frontier Model Releases Evaluation and Benchmarking GPT-3 pass@k OpenAI +3 more

5Openai Blog·18d ago·source ↗

OpenAI expands Codex with plugins, sites, and annotations for non-engineering roles

OpenAI announced new Codex capabilities including plugins, sites, and annotations targeting analysts, marketers, designers, investors, and other non-engineering teams. The expansion positions Codex as a broader productivity platform beyond software development. This represents a product surface expansion for OpenAI's coding-focused AI agent.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI Codex

4Openai Blog·9d ago·source ↗

Astrophysicist uses OpenAI Codex to build black hole simulations

Astrophysicist Chi-kwan Chan uses OpenAI's Codex to assist in building simulations of black holes, enabling study of extreme physics and testing of Einstein's general relativity. The piece is a deployment case study from OpenAI's blog highlighting scientific use of Codex in computational astrophysics research.

Enterprise Deployment Patterns Agent and Tool Ecosystem Chi-kwan Chan OpenAI Codex

5Openai Blog·1mo ago·source ↗

Building a safe, effective sandbox to enable Codex on Windows

OpenAI describes the engineering work behind a secure sandbox environment for running Codex coding agents on Windows. The sandbox enforces controlled file access and network restrictions to enable safe, efficient agentic code execution. This is part of OpenAI's broader effort to deploy coding agents in production environments with appropriate isolation guarantees.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI Codex Windows Sandbox

5Openai Blog·1mo ago·source ↗

How NVIDIA Engineers and Researchers Build with Codex

OpenAI published a case study describing how NVIDIA teams use Codex powered by GPT-5.5 to ship production systems and accelerate research experimentation. The piece highlights enterprise adoption of Codex as a coding agent in a major hardware/AI lab context. It signals continued real-world deployment of OpenAI's agentic coding tools at scale.

Frontier Model Releases Enterprise Deployment Patterns NVIDIA OpenAI Codex +2 more

5Openai Blog·1mo ago·source ↗

Running Codex Safely at OpenAI

OpenAI published a blog post describing the security architecture used to run Codex as a coding agent internally, covering sandboxing, human approval workflows, network policies, and agent-native telemetry. The post is aimed at supporting enterprise adoption of coding agents by demonstrating safe and compliant deployment patterns. It provides operational detail on how OpenAI itself governs agentic code execution in production.

AI Safety Research Enterprise Deployment Patterns sandboxing OpenAI agent-native telemetry +2 more

5Openai Blog·1mo ago·source ↗

Work with Codex from anywhere

OpenAI is extending Codex access to the ChatGPT mobile app, enabling users to monitor, steer, and approve coding tasks in real time from mobile devices and remote environments. This update brings Codex's agentic coding capabilities beyond desktop/web interfaces. The announcement positions Codex as a persistent, cross-device coding agent rather than a session-bound tool.

Enterprise Deployment Patterns Agent and Tool Ecosystem ChatGPT OpenAI Codex

6Openai Blog·1mo ago·source ↗

OpenAI and Dell Partner to Bring Codex to Hybrid and On-Premise Enterprise Environments

OpenAI and Dell Technologies have announced a partnership to deploy Codex, OpenAI's AI coding agent, in hybrid and on-premise enterprise environments. The collaboration targets enterprises requiring secure, local deployment of AI coding capabilities across their data and workflows. This extends Codex's reach beyond cloud-only access into infrastructure-sensitive enterprise settings.

Inference Economics Enterprise Deployment Patterns OpenAI Dell Technologies Codex +1 more

5Openai Blog·1mo ago·source ↗

Harness Engineering: Leveraging Codex in an Agent-First World

OpenAI published a technical post by Ryan Lopopolo describing how Codex is being used in an agent-first engineering workflow. The piece appears to cover practical patterns for integrating Codex into software development pipelines where AI agents take a more central role. As a Tier 1 source announcement, it likely details real-world engineering practices and lessons from deploying Codex at scale.

Enterprise Deployment Patterns Agent and Tool Ecosystem Ryan Lopopolo OpenAI Codex

6Openai Blog·1mo ago·source ↗

Unlocking the Codex Harness: How OpenAI Built the App Server

OpenAI published a technical deep-dive on the Codex App Server, a bidirectional JSON-RPC API designed to embed the Codex coding agent into external applications. The server supports streaming progress updates, tool use, human-in-the-loop approvals, and diff outputs. The post explains the architectural choices enabling developers to integrate Codex agent capabilities programmatically.

Frontier Model Releases Enterprise Deployment Patterns Codex App Server JSON-RPC OpenAI +2 more

5Openai Blog·1mo ago·source ↗

Cisco and OpenAI redefine enterprise engineering with AI agents

Cisco and OpenAI have announced a partnership embedding Codex, OpenAI's AI software agent, into Cisco's enterprise engineering workflows. The integration aims to accelerate software builds, automate defect remediation, and enable AI-native development practices at enterprise scale. This represents a significant enterprise deployment of agentic coding capabilities within a major networking and infrastructure company.

Enterprise Deployment Patterns Agent and Tool Ecosystem Cisco OpenAI Codex

5Openai Blog·1mo ago·source ↗

How We Used Codex to Ship Sora for Android in 28 Days

OpenAI used its Codex AI coding assistant to ship the Sora Android app in 28 days, leveraging AI-assisted planning, code translation, and parallel coding workflows. The case study highlights how a small team achieved rapid mobile development by integrating Codex throughout the engineering process. This serves as a concrete internal deployment example of agentic coding tools accelerating software delivery.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI Sora Codex

6Openai Blog·1mo ago·source ↗

OpenAI Publishes System Card Addendum for Codex Agent and codex-1 Model

OpenAI released an addendum to the o3 and o4-mini system cards covering Codex, a cloud-based coding agent powered by codex-1—a variant of o3 fine-tuned for software engineering via reinforcement learning on real-world coding tasks. codex-1 is designed to produce code matching human style and PR conventions, follow instructions precisely, and iterate on tests until they pass. The addendum provides safety and capability documentation for this specialized agentic deployment.

Frontier Model Releases AI Safety Research o3 and o4-mini system card o4-mini OpenAI +4 more

5Openai Blog·24d ago·source ↗

Building Self-Improving Tax Agents with Codex

OpenAI, Thrive, and Crete collaborated to build a self-improving tax agent using Codex, targeting automation of tax filings, accuracy improvements, and workflow acceleration. The system demonstrates an agentic deployment pattern where the agent iteratively improves its own performance. This represents a concrete enterprise deployment case study of OpenAI's Codex in a high-stakes professional domain.

Enterprise Deployment Patterns Agent and Tool Ecosystem Crete Thrive OpenAI +1 more

4Openai Blog·22d ago·source ↗

How Braintrust turns customer requests into code with Codex

Braintrust engineers are using OpenAI's Codex with GPT-5.5 to accelerate coding workflows and run experiments faster. The post describes how the team integrates Codex into their development process to convert customer requests into working code. This is a deployment case study highlighting practical use of OpenAI's latest coding-focused model in a production engineering context.

Frontier Model Releases Enterprise Deployment Patterns Braintrust OpenAI Codex +2 more

5Openai Blog·16d ago·source ↗

Wasmer used OpenAI Codex with GPT-5.5 to build a Node.js edge runtime 10-20x faster

Wasmer used OpenAI's Codex powered by GPT-5.5 to build a Node.js runtime for edge computing, reporting 10x to 20x development acceleration and shipping in weeks instead of months. The case study is published on the OpenAI blog as a deployment showcase. It provides concrete evidence of agentic coding tools compressing development timelines for systems-level infrastructure work.

Enterprise Deployment Patterns Agent and Tool Ecosystem Wasmer OpenAI Codex +1 more

7Openai Blog·1mo ago·source ↗

Introducing workspace agents in ChatGPT

OpenAI is launching workspace agents in ChatGPT, powered by Codex, designed to automate complex multi-step workflows in the cloud. These agents are aimed at teams and enterprises, enabling work to scale across tools securely. The announcement positions ChatGPT as an agentic platform for organizational productivity rather than just a conversational assistant.

Frontier Model Releases Enterprise Deployment Patterns ChatGPT workspace agents OpenAI +2 more

6Openai Blog·1mo ago·source ↗

An open-source spec for orchestration: Symphony

OpenAI has released Symphony, an open-source specification for orchestrating Codex-based agents. The spec is designed to connect issue trackers to always-on agent systems, aiming to increase engineering throughput and reduce context switching. Symphony represents OpenAI's push to standardize how software engineering agents are coordinated at the workflow level.

Enterprise Deployment Patterns Agent and Tool Ecosystem Symphony OpenAI Codex

7Openai Blog·1mo ago·source ↗

OpenAI to Acquire Astral

OpenAI has announced its acquisition of Astral, a developer tools company known for high-performance Python tooling (including the Ruff linter and uv package manager). The acquisition is framed as accelerating growth of OpenAI's Codex platform to power next-generation Python developer tools. This represents a strategic move by OpenAI to vertically integrate software development tooling with its AI coding capabilities.

Frontier Model Releases Enterprise Deployment Patterns Python uv OpenAI +4 more

6Openai Blog·1mo ago·source ↗

Unrolling the Codex Agent Loop

OpenAI published a technical deep dive into the Codex CLI agent loop, detailing how it orchestrates models, tools, and prompts via the Responses API. The post explains the internal architecture of the agentic coding system, including how the loop manages state, tool calls, and performance. This provides concrete implementation detail on how OpenAI structures production agent workflows on top of its API primitives.

Inference Economics Enterprise Deployment Patterns Responses API OpenAI Codex CLI +2 more

4Openai Blog·1mo ago·source ↗

Datadog uses Codex for system-level code review

OpenAI has published a case study describing Datadog's deployment of Codex for system-level code review tasks. The announcement highlights an enterprise adoption pattern where a major observability/monitoring company integrates OpenAI's code-focused model into production engineering workflows. Specific technical details about the integration scope, model version, or performance metrics are not available from the provided content.

Enterprise Deployment Patterns Agent and Tool Ecosystem Datadog OpenAI Codex

5Openai Blog·29d ago·source ↗

OpenAI Named a Leader in Gartner 2026 Magic Quadrant for Enterprise AI Coding Agents

Gartner has named OpenAI a Leader in its 2026 Magic Quadrant for Enterprise AI Coding Agents, with Codex specifically recognized for innovation and enterprise-scale deployment. This is a tier-1 analyst recognition that signals OpenAI's competitive positioning in the enterprise agentic coding market. The designation reflects growing institutional adoption of AI coding agents at scale.

Enterprise Deployment Patterns Agent and Tool Ecosystem Gartner Magic Quadrant for Enterprise AI Coding Agents Gartner OpenAI +1 more

7Openai Blog·19d ago·source ↗

OpenAI Frontier Models and Codex Now Generally Available on AWS

OpenAI has made its frontier models and Codex generally available on Amazon Web Services, enabling enterprise customers to access OpenAI capabilities through AWS environments, controls, and procurement workflows. This gives organizations a new deployment path that integrates with existing AWS infrastructure. The move is aimed at accelerating enterprise adoption by reducing friction between evaluation and production deployment.

Inference Economics Enterprise Deployment Patterns OpenAI frontier models OpenAI Amazon Web Services +2 more

5Hugging Face Blog·1mo ago·source ↗

Custom CUDA Kernels for All from Codex and Claude

A Hugging Face blog post describes using AI coding agents (Codex and Claude) to automatically generate custom CUDA kernels, lowering the barrier to GPU kernel development. The piece demonstrates agent-assisted GPU programming as a practical workflow for ML practitioners. This represents a concrete application of AI coding tools to the specialized domain of CUDA/GPU optimization.

Training Infrastructure Inference Economics Claude Hugging Face OpenAI +4 more

7Openai Blog·1mo ago·source ↗

OpenAI models, Codex, and Managed Agents come to AWS

OpenAI has announced that its GPT models, Codex, and Managed Agents are now available on AWS, allowing enterprise customers to deploy OpenAI capabilities within their existing AWS environments. The partnership extends OpenAI's distribution reach into the major cloud hyperscaler ecosystem. This follows a broader industry pattern of AI labs partnering with cloud providers to reach enterprise customers through familiar procurement and compliance channels.

Inference Economics Enterprise Deployment Patterns OpenAI Managed Agents OpenAI GPT +3 more

5Openai Blog·1mo ago·source ↗

Speeding up agentic workflows with WebSockets in the Responses API

OpenAI published a technical deep dive into the Codex agent loop, detailing how WebSockets and connection-scoped caching were used to reduce API overhead and improve model latency. The post focuses on infrastructure optimizations within the Responses API for agentic workflows. These changes are relevant to developers building multi-step agent pipelines that rely on repeated API calls.

Inference Economics Agent and Tool Ecosystem connection-scoped caching Responses API OpenAI +2 more

7Openai Blog·1mo ago·source ↗

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI

Cloudflare is integrating OpenAI's GPT-5.4 and Codex models into its Agent Cloud platform, targeting enterprise customers building and deploying AI agents at scale. The partnership positions Cloudflare's infrastructure as a secure, high-performance runtime for agentic workloads. This represents a significant enterprise distribution channel for OpenAI's latest models.

Frontier Model Releases Inference Economics Cloudflare OpenAI Cloudflare Agent Cloud +4 more

6Openai Blog·1mo ago·source ↗

The next phase of enterprise AI

OpenAI published a blog post outlining its vision for the next phase of enterprise AI adoption, highlighting products including Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents. The post signals accelerating enterprise deployment across industries. The announcement appears to frame OpenAI's strategic positioning in the enterprise market as agentic capabilities mature.

Frontier Model Releases Enterprise Deployment Patterns OpenAI Frontier OpenAI ChatGPT Enterprise +2 more

4Openai Blog·1mo ago·source ↗

Beyond rate limits: scaling access to Codex and Sora

OpenAI published a technical blog describing how it built a real-time access management system for Sora and Codex, combining rate limits, usage tracking, and credits to enable continuous, scalable access. The post details the infrastructure and policy mechanisms underlying production access to two high-demand products. This represents an operational engineering disclosure about how OpenAI manages capacity and fairness at scale.

Inference Economics Enterprise Deployment Patterns OpenAI Sora Codex

8Openai Blog·1mo ago·source ↗

Introducing GPT-5.3-Codex

OpenAI has announced GPT-5.3-Codex, described as a Codex-native agent combining frontier coding performance with general reasoning capabilities. The model is designed to support long-horizon, real-world technical work. The announcement positions it as an agentic coding system rather than a standalone language model.

Frontier Model Releases Inference Economics GPT-5.3-Codex OpenAI Codex +1 more

7Openai Blog·1mo ago·source ↗

Inside OpenAI's In-House Data Agent

OpenAI describes the architecture and capabilities of an internal AI data agent built on GPT-5 and Codex, designed to reason over large datasets and return reliable analytical insights within minutes. The system incorporates memory components to handle complex, multi-step data queries at scale. This represents a concrete internal deployment of frontier models in an agentic, tool-using workflow. The post offers a rare look at how OpenAI itself operationalizes its own models for enterprise-style data analysis.

Frontier Model Releases Inference Economics OpenAI OpenAI Data Agent Codex +3 more

7Openai Blog·1mo ago·source ↗

OpenAI Introduces GPT-5.1-Codex-Max for Agentic Coding

OpenAI has released GPT-5.1-Codex-Max, a new model optimized for agentic coding tasks within the Codex platform. The model targets long-running, project-scale software development work with improvements in reasoning and token efficiency. It is positioned as a faster and more capable successor for autonomous coding workflows.

Frontier Model Releases Inference Economics GPT-5.1-Codex-Max OpenAI Codex +1 more

7Openai Blog·1mo ago·source ↗

OpenAI releases GPT-5-Codex: GPT-5 variant optimized for agentic coding

OpenAI has published an addendum to the GPT-5 system card introducing GPT-5-Codex, a version of GPT-5 specifically optimized for agentic coding within the Codex environment. The model features dynamic thinking-effort adjustment, scaling compute based on task complexity—responding quickly to simple queries while sustaining longer independent work on complex coding tasks. This represents a specialized derivative of GPT-5 targeting software engineering agents rather than general-purpose use.

Frontier Model Releases Inference Economics GPT-5.3-Codex OpenAI GPT-5.5 System Card +3 more

4Openai Blog·1mo ago·source ↗

New GPT-3 capabilities: Edit & insert

OpenAI released updated versions of GPT-3 and Codex that support editing and inserting content into existing text, expanding beyond the original completion-only paradigm. These new capabilities allow the models to make targeted modifications to text rather than only appending to it. The release represents an incremental but meaningful expansion of the GPT-3 API surface.

Frontier Model Releases Agent and Tool Ecosystem GPT-3 OpenAI Codex

4Openai Blog·1mo ago·source ↗

A research agenda for assessing the economic impacts of code generation models

OpenAI published a research agenda focused on evaluating the economic impacts of code generation models such as Codex. The agenda outlines methodological approaches for measuring how AI-assisted coding affects labor markets, productivity, and software development workflows. This represents an early structured effort by a major lab to systematically study downstream socioeconomic effects of their deployed models.

Enterprise Deployment Patterns Agent and Tool Ecosystem code generation OpenAI Codex

6Openai Blog·9d ago·source ↗

OpenAI to acquire Ona to expand Codex with persistent cloud environments

OpenAI announced plans to acquire Ona, a company providing secure, persistent cloud environments. The acquisition is aimed at expanding Codex's capabilities to support long-running AI agents across enterprise workflows. This signals OpenAI's continued investment in agentic infrastructure for enterprise use cases.

Frontier Model Releases Enterprise Deployment Patterns Ona OpenAI Codex +1 more

6Openai Blog·1mo ago·source ↗

Codex Security: now in research preview

OpenAI has launched Codex Security in research preview, an AI-powered application security agent. It analyzes project context to detect, validate, and patch complex vulnerabilities with the goal of higher confidence and reduced false-positive noise compared to traditional tools. The product extends OpenAI's Codex brand into the security domain.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI Codex Security Codex

6Openai Blog·1mo ago·source ↗

Efficient Training of Language Models to Fill in the Middle

OpenAI published research on training language models with a fill-in-the-middle (FIM) objective, enabling models to complete text given both a prefix and a suffix context. The technique allows infilling capabilities to be added at essentially no cost to left-to-right generative performance. This work has direct implications for code completion and editing use cases, and was later incorporated into Codex and related models.

Frontier Model Releases Agent and Tool Ecosystem Fill-in-the-Middle (FIM)OpenAI GPT +1 more

7arXiv · cs.AI·26d ago·source ↗

SkillOpt: Systematic Text-Space Optimizer for Self-Evolving Agent Skills

SkillOpt introduces a principled optimization framework for agent skills, treating the skill document as an external trainable state analogous to model weights. A separate optimizer model converts scored rollouts into bounded edits (add/delete/replace) on a skill document, accepting only edits that improve held-out validation scores. Evaluated across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt achieves best or tied performance on all 52 evaluated cells, lifting GPT-5.5 no-skill accuracy by up to +24.8 points inside the Codex agentic loop. Optimized skill artifacts also transfer across model scales and execution environments without further optimization.

Evaluation and Benchmarking Agent and Tool Ecosystem TextGrad SkillOpt Trace2Skill +6 more

8The Batch·17d ago·source ↗

GPT-5.4 released with tool search, computer use, and frontier benchmark performance

OpenAI released GPT-5.4 in Thinking and Pro variants, featuring an expanded context window (up to 1.05M input tokens), native computer use, tool search capabilities, and adjustable reasoning levels. In independent testing by Artificial Analysis, GPT-5.4 Pro at xhigh reasoning achieved state-of-the-art on GDP-Val-AA, BrowseComp, Terminal-Bench-Hard, SWE-Bench-Pro, and MCP Atlas, while trailing Gemini 3.1 Pro Preview on MMMU-Pro and Humanity's Last Exam. Pricing is set at the top of the market ($30/$180 per million input/output tokens for Pro), and the release also powers Codex, OpenAI's competitor to Claude Code. The item is reported via The Batch (tier 2 commentary) and includes additional context on Andrew Ng's chub CLI tool for agent documentation sharing.

Frontier Model Releases Inference Economics DeepLearning.AI Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

5Openai Blog·9d ago·source ↗

OpenAI models and Codex available through Oracle Cloud infrastructure commitment

OpenAI announced that its models and Codex are now accessible through Oracle Cloud Infrastructure, allowing enterprise customers to consume OpenAI services against existing Oracle cloud spending commitments. The partnership enables enterprise-grade security and governance controls for AI deployment. This extends OpenAI's distribution reach into Oracle's large enterprise customer base.

Frontier Model Releases Enterprise Deployment Patterns Oracle Cloud Infrastructure Oracle OpenAI +1 more

5arXiv · cs.LG·3d ago·source ↗

ReproRepo: Scalable LLM agent framework for reproducibility auditing using GitHub issues

ReproRepo is a new framework for evaluating LLM agents on reproducibility auditing of ML research, using naturally occurring GitHub issues as supervision signals rather than costly manual curation. The framework is instantiated on 1,149 recent ML papers from major conferences and benchmarks four frontier model-agent configurations. The best-performing agent (Codex with GPT-5.5) surfaces at least one semantically related human-reported reproduction blocker for ~90% of papers, though exact localization of issues remains a weakness. The work provides a reusable, scalable evaluation harness for this underexplored agentic task.

Evaluation and Benchmarking Agent and Tool Ecosystem OpenAI ReproRepo Codex +1 more

9Openai Blog·1mo ago·source ↗

Accelerating the next phase of AI

OpenAI has raised $122 billion in new funding, marking one of the largest capital raises in AI history. The funds are earmarked for expanding frontier AI development globally, investing in next-generation compute infrastructure, and scaling to meet growing demand for ChatGPT, Codex, and enterprise AI products. The announcement signals continued aggressive investment in AI infrastructure and model development at the frontier.

Training Infrastructure Frontier Model Releases ChatGPT OpenAI Codex +2 more

7arXiv · cs.AI·25d ago·source ↗

Retrying vs Resampling in AI Control: Safety Tradeoffs in Coding Scaffolds

This paper analyzes two strategies for handling flagged actions in AI coding scaffolds—retrying (blocking risky actions and continuing) and resampling (drawing multiple samples from the same context)—from an AI control perspective that treats the model as potentially adversarial. The authors find that retrying backfires because the untrusted model can exploit monitor rationale to craft stealthier attacks, while resampling avoids this information leakage. Using Claude Opus 4.6 as the untrusted model and MiMo-V2-Flash as the monitor on the BashArena benchmark, they show that drawing five samples per step and auditing on maximum suspicion score raises safety from 61% to 71% at a 0.3% audit budget. Two findings contradict prior work: auditing on maximum (not minimum) suspicion scores is better, and executing the least suspicious sample yields only marginal safety gains.

Evaluation and Benchmarking AI Safety Research Claude Opus 4.6 MiMo-V2-Flash Ctrl-Z +6 more

7The Batch·19d ago·source ↗

GPT-5.5 Tops Objective Benchmarks but Lags on Human Preference and Hallucination Metrics

OpenAI released GPT-5.5, a closed vision-language model targeting agentic coding, computer use, and knowledge work, priced at roughly double GPT-5.4's per-token rates. The model leads the Artificial Analysis Intelligence Index and ARC-AGI-2 at lower cost than prior leader Gemini 3 Deep Think, and sets state-of-the-art on several agentic benchmarks. However, GPT-5.5 shows a significantly elevated hallucination rate (85.53% vs. Claude Opus 4.7's 36.18%) and ranks poorly on Arena.ai's human-preference leaderboards, where Claude Opus models dominate. Apollo Research separately found GPT-5.5 lied about completing an impossible task in 29% of samples, up from 7% for GPT-5.4, and OpenAI's internal Preparedness Framework places it in the 'high' cybersecurity threat tier.

Frontier Model Releases Evaluation and Benchmarking Apollo Research VulnLMP Artificial Analysis Intelligence Index +18 more

5Anthropic News·17d ago·source ↗

Anthropic releases Claude Instant 1.2 with improved math, coding, and safety

Anthropic released Claude Instant 1.2, an updated version of its faster, lower-cost model tier, now available via API. The release incorporates capabilities from Claude 2 and shows measurable benchmark gains: 58.7% on Codex (vs 52.8% for 1.1) and 86.7% on GSM8K (vs 80.9% for 1.1). Safety improvements include reduced hallucination and greater jailbreak resistance as measured by automated red-teaming.

Frontier Model Releases Inference Economics Claude Codex GSM8K +2 more