Entity · model

GPT-Realtime-Whisper

modelactivegpt-realtime-whisper-9697cc3f·3 events·first seen May 18, 2026

Aliases: GPT-Realtime-Whisper, Realtime Whisper

Co-occurring entities

More like this (12)

GPT-Realtime-2 GPT-Realtime-Translate GPT-Realtime-2.1 mini Whisper ChatGPT GPT-Translate GPT-4 Turbo ChatGPT Pro WebGPT GPT-5.2 faster-whisper GPT-4.1

Recent events (3)

6Openai Release Notes·Jul 1, 2026·source ↗

OpenAI releases Realtime 2, Realtime Translate, and Realtime Whisper for speech-to-speech and streaming audio

OpenAI released three new audio API products: Realtime 2, a speech-to-speech voice model with configurable reasoning for agentic applications; Realtime Translate, for streaming speech translation; and Realtime Whisper, for streaming speech-to-text transcription. The release is accompanied by updated documentation including a dedicated Realtime translation guide and refreshed transcription guidance. These additions expand OpenAI's real-time audio API surface for developers building voice agents and multilingual applications.

Agent and Tool Ecosystem Multimodal Progress GPT-Realtime-Translate OpenAI GPT-Realtime-Whisper +1 more

7The Batch·May 18, 2026·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 GPT-Realtime-2 Claude +14 more

6The Batch·May 18, 2026·source ↗

OpenAI Updates Audio Models That Reason, Transcribe, and Translate

OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.

Frontier Model Releases Evaluation and Benchmarking Scale AI Audio MultiChallenge GPT-Realtime-2 Google +14 more