4OpenAI Release Notes·2d ago

OpenAI launches server-side compaction in the Responses API

OpenAI has shipped server-side compaction to its Responses API, a feature that manages context window usage automatically on the server side. This reduces the burden on developers to manually truncate or summarize conversation history when building long-running or agentic applications. The release is a quality-of-life infrastructure improvement for API consumers.

Long Context Evolution Agent and Tool Ecosystem Responses API OpenAI

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Name

Read asBeginner In-depth

Long Context EvolutionTopic guide

Long Context Evolution: From Bigger Windows to Smarter Memory

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

6Openai Blog·1mo ago·source ↗

New Tools and Features in the Responses API

OpenAI announced new tools and features for its Responses API, expanding the capabilities available to developers building on the platform. The update likely includes additional built-in tools, improved function calling, or new modalities accessible through the API. As a Tier 1 source announcement, this represents a meaningful expansion of OpenAI's developer-facing infrastructure. Specific details were not available in the body text provided.

Enterprise Deployment Patterns Agent and Tool Ecosystem Responses API OpenAI

4Openai Release Notes·2d ago·source ↗

OpenAI launches WebSocket mode for the Responses API

OpenAI added WebSocket mode to its Responses API, enabling persistent bidirectional connections for API consumers. This is an infrastructure-level capability update that allows lower-latency, streaming-friendly integrations compared to standard HTTP request-response patterns. The change is relevant for developers building real-time or agentic applications on top of OpenAI's API.

Inference Economics Agent and Tool Ecosystem Responses API OpenAI

5Openai Blog·1mo ago·source ↗

Speeding up agentic workflows with WebSockets in the Responses API

OpenAI published a technical deep dive into the Codex agent loop, detailing how WebSockets and connection-scoped caching were used to reduce API overhead and improve model latency. The post focuses on infrastructure optimizations within the Responses API for agentic workflows. These changes are relevant to developers building multi-step agent pipelines that rely on repeated API calls.

Inference Economics Agent and Tool Ecosystem connection-scoped caching Responses API OpenAI +2 more

4Openai Release Notes·2d ago·source ↗

OpenAI adds inline moderation scores to Responses API and Chat Completions API

OpenAI has added moderation scoring directly to the Responses API and Chat Completions API, allowing developers to receive moderation results for both inputs and outputs in a single API call. Previously, moderation required a separate API request. This reduces latency and integration complexity for applications that need content safety checks.

AI Safety Research Enterprise Deployment Patterns Responses API OpenAI Completions API

7Openai Blog·1mo ago·source ↗

From model to agent: Equipping the Responses API with a computer environment

OpenAI describes how it built an agent runtime by combining the Responses API with a shell tool and hosted containers, enabling agents to operate with persistent files, tools, and state. The architecture supports secure, scalable execution of agentic workflows. This represents a concrete infrastructure layer for deploying agents in production environments.

Shell Tool Responses API OpenAI

6arXiv · cs.CL·10d ago·source ↗

SelfCompact: Model-driven adaptive context compaction for long agent traces

Researchers propose SelfCompact, a scaffold that lets language models decide when and how to compact their own accumulated context during long agentic runs, rather than relying on fixed token-threshold triggers. The system pairs a compaction tool with a lightweight rubric specifying when to invoke or suppress compaction based on trajectory structure (e.g., sub-task completion vs. mid-derivation). Evaluated across six benchmarks and seven models, SelfCompact matches or exceeds fixed-interval summarization while reducing per-question token cost by 30-70%, with gains of up to 18.1 points on math tasks and 5-9 points on agentic search. The work identifies a 'meta-cognitive gap' in unprompted models and shows it can be closed via scaffolding without fine-tuning.

Long Context Evolution Inference Economics SelfCompact Self-Compacting Language Model Agents +1 more

7Openai Release Notes·2d ago·source ↗

OpenAI releases GPT-5.4 and GPT-5.4 pro to the API with computer use, 1M context, and tool search

OpenAI released GPT-5.4 and GPT-5.4 pro to the Chat Completions and Responses API, positioning them as frontier models for professional and compute-intensive work. The release bundles several infrastructure capabilities: tool search for deferred runtime tool loading to reduce token usage and improve latency, built-in computer use via screenshot-based UI interaction, a 1M token context window, and native Compaction support for long-running agent workflows. These additions collectively advance OpenAI's agentic API surface significantly. Note: as of the current canonical facts, GPT-5.5 is the current OpenAI flagship, making this a prior-generation release.

Long Context Evolution Frontier Model Releases Responses API GPT Pro OpenAI +3 more

8Openai Blog·1mo ago·source ↗

OpenAI Announces Function Calling, Longer Context, and API Price Reductions

OpenAI introduced function calling capabilities to its API, enabling models to reliably output structured JSON for calling developer-defined functions. The update also includes longer context windows, more steerable models (gpt-3.5-turbo-16k and gpt-4 updates), and reduced pricing on several API tiers. These changes significantly expand the practical utility of OpenAI models for agentic and tool-use applications.

Long Context Evolution Frontier Model Releases GPT-3.5 Turbo OpenAI API OpenAI +4 more