Advanced audio dialog and generation with Gemini 2.5
Google DeepMind has announced new audio dialog and generation capabilities in Gemini 2.5. The update extends the model's multimodal capabilities into AI-powered audio interaction and synthesis. No further technical details are provided in the announcement body.
Related guides (3)
Related events (8)
Improved Gemini Audio Models for Powerful Voice Experiences
DeepMind has announced improved Gemini audio models targeting enhanced voice experience capabilities. The announcement comes from the official DeepMind blog, indicating a formal product or capability update to the Gemini model family's audio processing and generation features. Specific technical details were not available in the body text, but the framing suggests advances in speech understanding, synthesis, or real-time voice interaction. This is part of Google DeepMind's ongoing development of multimodal Gemini capabilities.
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
DeepMind has released Gemini 3.1 Flash TTS, a new audio model focused on expressive speech generation. The model introduces granular audio tags that allow developers precise control over AI speech output. This represents an incremental advancement in Google's text-to-speech capabilities within the Gemini model family.
Gemini 3.1 Flash Live: Making audio AI more natural and reliable
DeepMind has released Gemini 3.1 Flash Live, a new voice model designed for real-time audio interactions. The model features improved precision and lower latency compared to its predecessor, aiming to make voice-based AI interactions more fluid and natural. The announcement comes from DeepMind's official blog, indicating a production-grade release.
Gemini App Integrates Lyria 3 for AI Music Generation
Google DeepMind has integrated Lyria 3, its most advanced music generation model, into the Gemini app. Users can now generate 30-second music tracks from text or image prompts. This marks a consumer-facing multimodal capability expansion for the Gemini product.
Gemini 3.5: Frontier Intelligence with Action
Google DeepMind has announced Gemini 3.5, a new model generation positioned around agentic capabilities and complex workflow execution. The announcement emphasizes action-oriented AI, suggesting a focus on tool use, multi-step reasoning, and autonomous task completion. The blog post is brief, indicating this may be an initial announcement with further details to follow.
Google DeepMind launches Gemini 3.5 Live Translate for real-time voice translation
Google DeepMind has released Gemini 3.5 Live Translate, a near real-time speech translation capability powered by Gemini 3.5. The feature is being deployed across Google AI Studio, Google Translate, and Google Meet. This represents a multimodal capability expansion of the Gemini model family into live audio translation at production scale.
Updated Gemini 2.5 Pro Preview with Improved Coding Capabilities
Google DeepMind has released an updated version of Gemini 2.5 Pro Preview with enhanced coding capabilities, specifically targeting the development of rich, interactive web applications. The announcement comes from DeepMind's official blog, indicating a focused improvement on code generation and web app development use cases. No detailed technical specifics or benchmark results are provided in the body text.
A new era of intelligence with Gemini 3
DeepMind has published a blog post titled 'A new era of intelligence with Gemini 3,' suggesting a major new model release or announcement in the Gemini series. The body content was not provided, but the title and source indicate this is a flagship model announcement from Google DeepMind. This would represent the next generation of the Gemini model family following Gemini 2.x.


