
GPT-Realtime-2
gpt-realtime-2-1d0a1575·5 events·first seen 1mo agoAliases: GPT-Realtime-2, gpt-realtime
Co-occurring entities
More like this (12)
Recent events (5)
OpenAI Updates Audio Models That Reason, Transcribe, and Translate
OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.
Introducing gpt-realtime and Realtime API updates
OpenAI is releasing a new speech-to-speech model called gpt-realtime alongside expanded Realtime API capabilities. New features include MCP server support, image input, and SIP phone calling support. These updates extend the Realtime API's utility for voice-driven and multimodal agent applications.
GPT-Realtime-2, GPT-Translate, and new Whisper: OpenAI's new SOTA realtime voice APIs
OpenAI has released a suite of new real-time voice and audio APIs including GPT-Realtime-2, a GPT-Translate model, and an updated Whisper, all positioned as state-of-the-art for real-time voice applications. The releases appear to be part of a broader push to deploy GPT-5 capabilities across multiple product surfaces. Coverage comes from the Latent Space AI News digest, which aggregates and contextualizes the announcements.
Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability
This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.
Thinking Machines Lab Reveals TML-Interaction-Small: Real-Time Multimodal Interaction Model
Thinking Machines Lab (founded by Mira Murati) has announced TML-Interaction-Small, a 276B-parameter mixture-of-experts multimodal model that processes audio, video, and text concurrently using 200ms 'micro-turns' rather than waiting for conversational turns to complete. The architecture uses encoder-free early fusion, pairing a fast foreground interaction model with an asynchronous background reasoning model that shares context. On interactivity benchmarks (FD-bench V1/V1.5), it outperforms GPT-Realtime-2 and Gemini-3.1-flash-live-preview, though it trails GPT-Realtime-2 on intelligence benchmarks. A closed research preview is expected in coming months with wider release later in 2026.