OpenAI launches server-side compaction in the Responses API
OpenAI has shipped server-side compaction to its Responses API, a feature that manages context window usage automatically on the server side. This reduces the burden on developers to manually truncate or summarize conversation history when building long-running or agentic applications. The release is a quality-of-life infrastructure improvement for API consumers.
Related guides (3)
Related events (8)
New Tools and Features in the Responses API
OpenAI announced new tools and features for its Responses API, expanding the capabilities available to developers building on the platform. The update likely includes additional built-in tools, improved function calling, or new modalities accessible through the API. As a Tier 1 source announcement, this represents a meaningful expansion of OpenAI's developer-facing infrastructure. Specific details were not available in the body text provided.
OpenAI launches WebSocket mode for the Responses API
OpenAI added WebSocket mode to its Responses API, enabling persistent bidirectional connections for API consumers. This is an infrastructure-level capability update that allows lower-latency, streaming-friendly integrations compared to standard HTTP request-response patterns. The change is relevant for developers building real-time or agentic applications on top of OpenAI's API.
Speeding up agentic workflows with WebSockets in the Responses API
OpenAI published a technical deep dive into the Codex agent loop, detailing how WebSockets and connection-scoped caching were used to reduce API overhead and improve model latency. The post focuses on infrastructure optimizations within the Responses API for agentic workflows. These changes are relevant to developers building multi-step agent pipelines that rely on repeated API calls.
OpenAI adds inline moderation scores to Responses API and Chat Completions API
OpenAI has added moderation scoring directly to the Responses API and Chat Completions API, allowing developers to receive moderation results for both inputs and outputs in a single API call. Previously, moderation required a separate API request. This reduces latency and integration complexity for applications that need content safety checks.
From model to agent: Equipping the Responses API with a computer environment
OpenAI describes how it built an agent runtime by combining the Responses API with a shell tool and hosted containers, enabling agents to operate with persistent files, tools, and state. The architecture supports secure, scalable execution of agentic workflows. This represents a concrete infrastructure layer for deploying agents in production environments.
SelfCompact: Model-driven adaptive context compaction for long agent traces
Researchers propose SelfCompact, a scaffold that lets language models decide when and how to compact their own accumulated context during long agentic runs, rather than relying on fixed token-threshold triggers. The system pairs a compaction tool with a lightweight rubric specifying when to invoke or suppress compaction based on trajectory structure (e.g., sub-task completion vs. mid-derivation). Evaluated across six benchmarks and seven models, SelfCompact matches or exceeds fixed-interval summarization while reducing per-question token cost by 30-70%, with gains of up to 18.1 points on math tasks and 5-9 points on agentic search. The work identifies a 'meta-cognitive gap' in unprompted models and shows it can be closed via scaffolding without fine-tuning.
OpenAI releases GPT-5.4 and GPT-5.4 pro to the API with computer use, 1M context, and tool search
OpenAI released GPT-5.4 and GPT-5.4 pro to the Chat Completions and Responses API, positioning them as frontier models for professional and compute-intensive work. The release bundles several infrastructure capabilities: tool search for deferred runtime tool loading to reduce token usage and improve latency, built-in computer use via screenshot-based UI interaction, a 1M token context window, and native Compaction support for long-running agent workflows. These additions collectively advance OpenAI's agentic API surface significantly. Note: as of the current canonical facts, GPT-5.5 is the current OpenAI flagship, making this a prior-generation release.
OpenAI Announces Function Calling, Longer Context, and API Price Reductions
OpenAI introduced function calling capabilities to its API, enabling models to reliably output structured JSON for calling developer-defined functions. The update also includes longer context windows, more steerable models (gpt-3.5-turbo-16k and gpt-4 updates), and reduced pricing on several API tiers. These changes significantly expand the practical utility of OpenAI models for agentic and tool-use applications.


