6OpenAI Release Notes·2d ago

OpenAI releases Realtime 2, Realtime Translate, and Realtime Whisper for speech-to-speech and streaming audio

OpenAI released three new audio API products: Realtime 2, a speech-to-speech voice model with configurable reasoning for agentic applications; Realtime Translate, for streaming speech translation; and Realtime Whisper, for streaming speech-to-text transcription. The release is accompanied by updated documentation including a dedicated Realtime translation guide and refreshed transcription guidance. These additions expand OpenAI's real-time audio API surface for developers building voice agents and multilingual applications.

Agent and Tool Ecosystem Multimodal Progress GPT-Realtime-Translate OpenAI GPT-Realtime-Whisper Realtime 2

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Name

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

7Latent Space·1mo ago·source ↗

GPT-Realtime-2, GPT-Translate, and new Whisper: OpenAI's new SOTA realtime voice APIs

OpenAI has released a suite of new real-time voice and audio APIs including GPT-Realtime-2, a GPT-Translate model, and an updated Whisper, all positioned as state-of-the-art for real-time voice applications. The releases appear to be part of a broader push to deploy GPT-5 capabilities across multiple product surfaces. Coverage comes from the Latent Space AI News digest, which aggregates and contextualizes the announcements.

Frontier Model Releases Agent and Tool Ecosystem GPT-Realtime-2 OpenAI Whisper +3 more

6The Batch·1mo ago·source ↗

OpenAI Updates Audio Models That Reason, Transcribe, and Translate

OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.

Frontier Model Releases Evaluation and Benchmarking Scale AI Audio MultiChallenge GPT-Realtime-2 Google +14 more

7Openai Blog·1mo ago·source ↗

Introducing the Realtime API

OpenAI has launched the Realtime API, enabling developers to build low-latency speech-to-speech experiences directly into their applications. The API provides native audio input and output without requiring separate transcription and text-to-speech steps. This represents a significant infrastructure offering for voice-enabled AI applications, moving beyond text-based API paradigms.

Inference Economics Enterprise Deployment Patterns GPT-4o Realtime API OpenAI +2 more

7Openai Blog·1mo ago·source ↗

Advancing voice intelligence with new models in the API

OpenAI is releasing new realtime voice models via its API with capabilities spanning reasoning, translation, and transcription. The announcement targets developers building voice-enabled applications and represents an expansion of OpenAI's voice intelligence offerings beyond the existing Realtime API. The models are positioned to enable more natural and intelligent voice experiences in production deployments.

Frontier Model Releases Enterprise Deployment Patterns OpenAI voice models OpenAI Realtime API OpenAI +1 more

7Openai Blog·1mo ago·source ↗

Introducing gpt-realtime and Realtime API updates

OpenAI is releasing a new speech-to-speech model called gpt-realtime alongside expanded Realtime API capabilities. New features include MCP server support, image input, and SIP phone calling support. These updates extend the Realtime API's utility for voice-driven and multimodal agent applications.

Frontier Model Releases Inference Economics GPT-Realtime-2 SIP Realtime API +4 more

5Openai Release Notes·2d ago·source ↗

OpenAI releases gpt-realtime-1.5 and gpt-audio-1.5 to production APIs

OpenAI has released gpt-realtime-1.5 to the Realtime API and gpt-audio-1.5 to the Chat Completions API. These are incremental model updates to OpenAI's audio and real-time speech capabilities. The release expands developer access to updated audio-capable models through existing API surfaces.

Frontier Model Releases Multimodal Progress OpenAI Chat Completions API GPT-Realtime-2 gpt-audio-1.5 +2 more

6Openai Blog·1mo ago·source ↗

How OpenAI Delivers Low-Latency Voice AI at Scale

OpenAI published a technical overview of how it rebuilt its WebRTC stack to support real-time voice AI at global scale. The post covers infrastructure choices enabling low-latency audio delivery and conversational turn-taking. This represents a production-grade engineering disclosure about the systems underpinning OpenAI's voice products.

Inference Economics Enterprise Deployment Patterns WebRTC OpenAI Voice AI OpenAI +1 more

8Openai Blog·1mo ago·source ↗

Introducing Whisper

OpenAI introduced Whisper, an open-source automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The model demonstrates strong robustness to accents, background noise, and technical language, approaching human-level accuracy in English transcription. Whisper supports transcription in multiple languages as well as translation to English, and the weights and inference code were released publicly.

Open Weights Progress Agent and Tool Ecosystem OpenAI Whisper +1 more