Almanac
← Events
6The Batch (DeepLearning.AI)·19d ago

Google Debuted Lyria 3, An App That Turns Text or Images Into 30-Second Songs

Google launched Lyria 3, a latent diffusion-based music generation model integrated into the Gemini app and YouTube Shorts, capable of producing 30-second audio clips with vocals and instruments from text or image prompts. Unlike its predecessor Lyria 2, Lyria 3 was trained on licensed audio data and includes copyright-filtering safeguards, SynthID watermarking, and RLHF fine-tuning. The model is available free to Gemini users (18+) and YouTube Shorts creators, reaching an estimated 750 million users. Google also acquired ProducerAI (formerly Riffusion) shortly after launch, signaling continued investment in AI music tooling.

Related guides (5)

Related events (8)

6Google Deepmind Blog·1mo ago·source ↗

Gemini App Integrates Lyria 3 for AI Music Generation

Google DeepMind has integrated Lyria 3, its most advanced music generation model, into the Gemini app. Users can now generate 30-second music tracks from text or image prompts. This marks a consumer-facing multimodal capability expansion for the Gemini product.

5Google Deepmind Blog·1mo ago·source ↗

Lyria 3 Pro: DeepMind Launches Upgraded AI Music Generation Model

DeepMind has announced Lyria 3 Pro, an upgraded AI music generation model that enables longer track creation with structural awareness. The release also expands Lyria's availability across more Google products and surfaces. This represents an incremental capability upgrade to DeepMind's generative audio lineup.

7Google Deepmind Blog·1mo ago·source ↗

Advanced audio dialog and generation with Gemini 2.5

Google DeepMind has announced new audio dialog and generation capabilities in Gemini 2.5. The update extends the model's multimodal capabilities into AI-powered audio interaction and synthesis. No further technical details are provided in the announcement body.

6Google Deepmind Blog·1mo ago·source ↗

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

DeepMind has released Gemini 3.1 Flash TTS, a new audio model focused on expressive speech generation. The model introduces granular audio tags that allow developers precise control over AI speech output. This represents an incremental advancement in Google's text-to-speech capabilities within the Gemini model family.

6Google Deepmind Blog·1mo ago·source ↗

Gemini 3.1 Flash Live: Making audio AI more natural and reliable

DeepMind has released Gemini 3.1 Flash Live, a new voice model designed for real-time audio interactions. The model features improved precision and lower latency compared to its predecessor, aiming to make voice-based AI interactions more fluid and natural. The announcement comes from DeepMind's official blog, indicating a production-grade release.

6Google Deepmind Blog·1mo ago·source ↗

Improved Gemini Audio Models for Powerful Voice Experiences

DeepMind has announced improved Gemini audio models targeting enhanced voice experience capabilities. The announcement comes from the official DeepMind blog, indicating a formal product or capability update to the Gemini model family's audio processing and generation features. Specific technical details were not available in the body text, but the framing suggests advances in speech understanding, synthesis, or real-time voice interaction. This is part of Google DeepMind's ongoing development of multimodal Gemini capabilities.

6Google Deepmind Blog·11d ago·source ↗

Google DeepMind launches Gemini 3.5 Live Translate for real-time voice translation

Google DeepMind has released Gemini 3.5 Live Translate, a near real-time speech translation capability powered by Gemini 3.5. The feature is being deployed across Google AI Studio, Google Translate, and Google Meet. This represents a multimodal capability expansion of the Gemini model family into live audio translation at production scale.

7Google Deepmind Blog·1mo ago·source ↗

Veo 2 Video Generation Launches in Gemini Advanced and Whisk Animate

Google DeepMind is rolling out Veo 2 video generation capabilities to Gemini Advanced and Whisk, enabling users to create high-resolution eight-second videos from text prompts or animate still images. Gemini Advanced subscribers can generate videos directly from text, while Whisk Animate converts input images into short animated clips. This marks a consumer-facing deployment of Veo 2, DeepMind's second-generation video generation model.