Gemini 3.1 Flash Live Preview
gemini-3-1-flash-live-preview-bf9e23c9·3 events·first seen 1mo agoAliases: Gemini 3.1 Flash Live Preview, Gemini-3.1-flash-live-preview, Gemini-3-Flash-Preview
Co-occurring entities
More like this (12)
Recent events (3)
Thinking Machines Lab Reveals TML-Interaction-Small: Real-Time Multimodal Interaction Model
Thinking Machines Lab (founded by Mira Murati) has announced TML-Interaction-Small, a 276B-parameter mixture-of-experts multimodal model that processes audio, video, and text concurrently using 200ms 'micro-turns' rather than waiting for conversational turns to complete. The architecture uses encoder-free early fusion, pairing a fast foreground interaction model with an asynchronous background reasoning model that shares context. On interactivity benchmarks (FD-bench V1/V1.5), it outperforms GPT-Realtime-2 and Gemini-3.1-flash-live-preview, though it trails GPT-Realtime-2 on intelligence benchmarks. A closed research preview is expected in coming months with wider release later in 2026.
OpenAI Updates Audio Models That Reason, Transcribe, and Translate
OpenAI introduced three new audio models in its Realtime API: GPT-Realtime-2 (speech-to-speech with five configurable reasoning effort levels), GPT-Realtime-Translate (70+ input languages), and GPT-Realtime-Whisper (transcription). GPT-Realtime-2 operates as an end-to-end audio model including reasoning, with latency ranging from 1.12 seconds at minimal effort to 2.33 seconds at high effort. Benchmark results are mixed: it leads Scale AI's Audio MultiChallenge and Artificial Analysis Conversational Dynamics but trails Step-Audio R1.1 Realtime and Grok Voice Think Fast 1.0 on speech reasoning and agentic tasks. The configurable reasoning-latency tradeoff is positioned as a key differentiator for voice agent applications.
Corpus-Grounded Feature Diffusion pipeline for automated IEP generation in Traditional Chinese
Researchers propose a low-resource fine-tuning pipeline called Corpus-Grounded Feature Diffusion (CGFD) to automate Individualized Education Program (IEP) drafting from Traditional Chinese parent-teacher interview transcripts. The approach fine-tunes Breeze-7B with QLoRA on 582 synthetically diffused samples and uses schema-constrained decoding at inference time, finding that Grammar-Constrained Decoding is counterproductive under Traditional Chinese token budgets. On a small formal hold-out (n=10), the system achieves BERTScore F1 of 0.779, outperforming zero-shot GPT-5.4, DeepSeek-V3.2, Gemini-3-Flash-Preview, and Llama-4-Maverick baselines while enabling fully local, air-gapped inference. The work addresses a gap in Traditional Chinese special-education NLP and demonstrates a privacy-preserving deployment pattern for sensitive document generation.