Almanac
benchmark

Scale AI Audio MultiChallenge

benchmarkactivescale-ai-audio-multichallenge-f98e3859·1 events·first seen 1mo ago

Aliases: Scale AI Audio MultiChallenge

Co-occurring entities

More like this (12)

Recent events (1)

6The Batch·1mo ago·source ↗

OpenAI Updates Audio Models That Reason, Transcribe, and Translate

OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.