Introducing the Realtime API
OpenAI has launched the Realtime API, enabling developers to build low-latency speech-to-speech experiences directly into their applications. The API provides native audio input and output without requiring separate transcription and text-to-speech steps. This represents a significant infrastructure offering for voice-enabled AI applications, moving beyond text-based API paradigms.
Related guides (5)

Multimodal ProgressTopic guide
Multimodal Progress: How AI Learned to See, Hear, and Act

Enterprise Deployment PatternsTopic guide
Enterprise Deployment Patterns: From LLM Demo to Production Reality
Related events (8)
Introducing gpt-realtime and Realtime API updates
OpenAI is releasing a new speech-to-speech model called gpt-realtime alongside expanded Realtime API capabilities. New features include MCP server support, image input, and SIP phone calling support. These updates extend the Realtime API's utility for voice-driven and multimodal agent applications.
Advancing voice intelligence with new models in the API
OpenAI is releasing new realtime voice models via its API with capabilities spanning reasoning, translation, and transcription. The announcement targets developers building voice-enabled applications and represents an expansion of OpenAI's voice intelligence offerings beyond the existing Realtime API. The models are positioned to enable more natural and intelligent voice experiences in production deployments.
How OpenAI Delivers Low-Latency Voice AI at Scale
OpenAI published a technical overview of how it rebuilt its WebRTC stack to support real-time voice AI at global scale. The post covers infrastructure choices enabling low-latency audio delivery and conversational turn-taking. This represents a production-grade engineering disclosure about the systems underpinning OpenAI's voice products.
OpenAI Updates Audio Models That Reason, Transcribe, and Translate
OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.
GPT-Realtime-2, GPT-Translate, and new Whisper: OpenAI's new SOTA realtime voice APIs
OpenAI has released a suite of new real-time voice and audio APIs including GPT-Realtime-2, a GPT-Translate model, and an updated Whisper, all positioned as state-of-the-art for real-time voice applications. The releases appear to be part of a broader push to deploy GPT-5 capabilities across multiple product surfaces. Coverage comes from the Latent Space AI News digest, which aggregates and contextualizes the announcements.
OpenAI Introduces Next-Generation Audio Models in the API
OpenAI is releasing new audio models via its API, including an updated text-to-speech model that accepts natural-language style instructions (e.g., 'talk like a sympathetic customer service agent'). This marks the first time developers can programmatically control speaking style through prompts rather than fixed voice presets. The release targets voice agent developers seeking finer-grained customization of synthesized speech.
FastRTC: The Real-Time Communication Library for Python
Hugging Face has released FastRTC, a Python library designed to simplify real-time communication (RTC) for AI applications, enabling developers to build voice and video AI pipelines with WebRTC. The library abstracts away the complexity of WebRTC signaling and media handling, allowing direct integration with Python-based AI models. It targets use cases such as real-time speech-to-speech, video processing, and interactive AI agents. The release positions Hugging Face further into the real-time AI inference and agent tooling space.
OpenAI API Launch
OpenAI announced the release of an API providing programmatic access to its AI models. This marked a significant infrastructure and commercialization milestone, enabling third-party developers to integrate OpenAI's models into their own applications. The launch established the foundation for OpenAI's developer ecosystem and API-first business model.


