Entity · model

GPT-Realtime-2

modelactivegpt-realtime-2-1d0a1575·10 events·first seen May 18, 2026

Aliases: GPT-Realtime-2, gpt-realtime, GPT Realtime 2, gpt-realtime-1.5, GPT-Realtime-2.1, GPT-Realtime

Co-occurring entities

More like this (12)

GPT-Realtime-2.1 mini GPT-Realtime-Translate GPT-Realtime-Whisper GPT-Image-2 GPT-2 GPT-5.2 GPT GPT-next GPT-4.1 GPT-1 GPT-4 Turbo GPTs

Recent events (10)

4Openai Blog·44h ago·source ↗

avatarin deploys GPT-Realtime retail agent at Yamada Denki with 30,000 users in two weeks

avatarin built a 24/7 multilingual retail support agent for Japanese electronics retailer Yamada Denki using OpenAI's GPT-Realtime API. The deployment reached 30,000 users within two weeks of launch, with 92% positive survey responses. The case study demonstrates real-time voice/conversational AI in a high-traffic consumer retail environment.

Enterprise Deployment Patterns GPT-Realtime-2 OpenAI avatarin +1 more

7The Batch·Jul 17, 2026·source ↗

OpenAI GPT-Live Pairs Full-Duplex Voice Models with GPT-5.5 Reasoning Backend

OpenAI released GPT-Live-1 and GPT-Live-1 mini on July 8, 2026, replacing Advanced Voice Mode with a full-duplex voice system that processes audio continuously and delegates harder queries to GPT-5.5 in the background. The architecture separates a real-time conversational voice model from a reasoning model, with user-selectable reasoning effort levels (Instant, Medium, High) routing to GPT-5.5 Instant or GPT-5.5 Thinking accordingly. Performance gains are substantial: GPQA scores jumped from 45.3% (AVM) to 84.2% (GPT-Live-1 at high reasoning), and BrowseComp improved from 0.7% to 75.2%. The system is live globally on iOS, Android, and ChatGPT.com for paid plans, though no developer API has shipped yet.

Frontier Model Releases Agent and Tool Ecosystem Thinking Machines GPT-Live ChatGPT +18 more

5Openai Release Notes·Jul 8, 2026·source ↗

OpenAI releases GPT-Realtime-2.1 and GPT-Realtime-2.1 mini for voice applications

OpenAI released GPT-Realtime-2.1, an updated realtime reasoning model with improvements to alphanumeric recognition, silence and noise handling, and interruption behavior. A companion model, GPT-Realtime-2.1 mini, was also released as a faster, lower-cost distilled variant for realtime voice use cases. The releases represent incremental improvements to OpenAI's realtime voice API tier rather than a flagship capability shift.

Frontier Model Releases Multimodal Progress GPT-Realtime-2 GPT-Realtime-2.1 mini OpenAI

5Openai Release Notes·Jul 1, 2026·source ↗

OpenAI releases gpt-realtime-1.5 and gpt-audio-1.5 to production APIs

OpenAI has released gpt-realtime-1.5 to the Realtime API and gpt-audio-1.5 to the Chat Completions API. These are incremental model updates to OpenAI's audio and real-time speech capabilities. The release expands developer access to updated audio-capable models through existing API surfaces.

Frontier Model Releases Multimodal Progress OpenAI Chat Completions API GPT-Realtime-2 gpt-audio-1.5 +2 more

7arXiv · cs.CL·Jun 25, 2026·source ↗

Study finds real-time voice AI systems ignore vocal delivery cues despite perceiving them

A new arXiv paper evaluates four production real-time voice AI systems — OpenAI GPT Realtime 2, Google Gemini 3.1 Flash Live, Qwen3.5 Omni Plus, and Qwen3.5 Omni Flash — on tasks where vocal delivery (distress, fear, sarcasm) carries meaningful information distinct from word content. All four systems consistently act on words alone, ending calls with crying users who deny distress, approving frightened-voice wire transfers, and accepting sarcastic consent. Critically, three of four systems can correctly identify the emotional state when asked directly, revealing a gap between perception and decision-making the authors term the 'emotional intelligence gap.' Prompting systems to attend to vocal delivery improves performance only partially and inconsistently.

Evaluation and Benchmarking AI Safety Research Qwen3.5 Omni Flash GPT-Realtime-2 Google +6 more

7The Batch·May 23, 2026·source ↗

Thinking Machines Lab Reveals TML-Interaction-Small: Real-Time Multimodal Interaction Model

Thinking Machines Lab (founded by Mira Murati) has announced TML-Interaction-Small, a 276B-parameter mixture-of-experts multimodal model that processes audio, video, and text concurrently using 200ms 'micro-turns' rather than waiting for conversational turns to complete. The architecture uses encoder-free early fusion, pairing a fast foreground interaction model with an asynchronous background reasoning model that shares context. On interactivity benchmarks (FD-bench V1/V1.5), it outperforms GPT-Realtime-2 and Gemini-3.1-flash-live-preview, though it trails GPT-Realtime-2 on intelligence benchmarks. A closed research preview is expected in coming months with wider release later in 2026.

Frontier Model Releases Inference Economics encoder-free early fusion Thinking Machines GPT-Realtime-2 +16 more

7Openai Blog·May 20, 2026·source ↗

Introducing gpt-realtime and Realtime API updates

OpenAI is releasing a new speech-to-speech model called gpt-realtime alongside expanded Realtime API capabilities. New features include MCP server support, image input, and SIP phone calling support. These updates extend the Realtime API's utility for voice-driven and multimodal agent applications.

Frontier Model Releases Inference Economics GPT-Realtime-2 SIP Realtime API +4 more

7Latent Space·May 19, 2026·source ↗

GPT-Realtime-2, GPT-Translate, and new Whisper: OpenAI's new SOTA realtime voice APIs

OpenAI has released a suite of new real-time voice and audio APIs including GPT-Realtime-2, a GPT-Translate model, and an updated Whisper, all positioned as state-of-the-art for real-time voice applications. The releases appear to be part of a broader push to deploy GPT-5 capabilities across multiple product surfaces. Coverage comes from the Latent Space AI News digest, which aggregates and contextualizes the announcements.

Frontier Model Releases Agent and Tool Ecosystem GPT-Realtime-2 OpenAI Whisper +3 more

7The Batch·May 18, 2026·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 GPT-Realtime-2 Claude +14 more

6The Batch·May 18, 2026·source ↗

OpenAI Updates Audio Models That Reason, Transcribe, and Translate

OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.

Frontier Model Releases Evaluation and Benchmarking Scale AI Audio MultiChallenge GPT-Realtime-2 Google +14 more