Grok Imagine 1.0 Sharply Cuts Costs for High-Quality Video Generation
xAI launched Grok Imagine 1.0, a text-and-image-to-video model that topped the Artificial Analysis Video Arena leaderboard in both text-to-video and image-to-video categories at launch. The model generates up to 15-second clips with audio at $4.20 per minute of output, significantly undercutting Google Veo 3.1 ($12/min) and OpenAI Sora 2 Pro ($30/min). It is integrated with the X social network, enabling direct generation and sharing, though xAI disclosed no technical details about the model's architecture. The launch highlights continued rapid cost compression in video generation, with a seven-fold price gap between Grok Imagine 1.0 and Sora 2 Pro.
Related guides (4)
Related events (8)
Why Video Agent Models Are Next — Ethan He, xAI Grok Imagine
Latent Space interviews Ethan He, the lead behind xAI's Grok Imagine video generation product, covering its development in roughly three months. The discussion explores the distinction between video generation models and world models, and positions video agents as a significant near-term frontier. He argues Grok Imagine is underrated relative to its capabilities.
Sora Video Generation Model Launches at sora.com
OpenAI has publicly launched Sora, its video generation model, available at sora.com. The model supports video generation up to 1080p resolution and 20 seconds in length, with widescreen, vertical, and square aspect ratios. Users can generate content from text prompts or bring existing assets to extend, remix, and blend.
Introducing Veo 3.1 and Advanced Creative Capabilities
Google DeepMind has announced Veo 3.1, an updated version of its video generation model, with significant enhancements to creative control features. The announcement comes from DeepMind's official blog, indicating a formal product update rather than a research preview. Specific capability details are not provided in the body text, but the framing suggests improvements to user-facing generation controls.
Veo 3.1 Ingredients to Video: More consistency, creativity and control
Google DeepMind has released Veo 3.1, an updated video generation model that improves consistency, creativity, and control in generated clips. The update produces more natural and dynamic video content and adds support for vertical video generation. The announcement comes from DeepMind's official blog as a tier-1 source.
ByteDance Launches Seedance 2.0 Video Generation Model Globally via CapCut
ByteDance has deployed Seedance 2.0, a multimodal video generation model, to hundreds of millions of CapCut users across multiple global regions. The model supports text, image, audio, and video inputs with synchronized audio-video output, lip-synced dialogue, and camera control via prompts. It ranks within the top two on Arena AI and Artificial Analysis video leaderboards, and is available via API at $0.30 per second of output. The issue also features Andrew Ng's editorial arguing against the 'AI jobpocalypse' narrative, attributing it to incentive structures at labs and companies.
ByteDance Deploys Seedance 2.0 Video Model to CapCut's 736M Users as OpenAI Shutters Sora
ByteDance has integrated Seedance 2.0, its multimodal video generation model, into CapCut for paying users across multiple global regions, reaching a platform with approximately 736 million monthly active users. The model supports text, image, audio, and video inputs, generates synchronized audio-video output in a single pass including multi-shot sequences, and ranks in the top two on Arena AI and Artificial Analysis video leaderboards, with Alibaba's HappyHorse-1.0 as its closest competitor. Simultaneously, OpenAI is discontinuing the Sora app and API after daily active users fell below 500,000 and operating costs reached an estimated $1 million per day. The contrast illustrates a broader market shift where Chinese developers are accelerating video model releases while U.S. consumer video products retreat.
Veo 2 Video Generation Launches in Gemini Advanced and Whisk Animate
Google DeepMind is rolling out Veo 2 video generation capabilities to Gemini Advanced and Whisk, enabling users to create high-resolution eight-second videos from text prompts or animate still images. Gemini Advanced subscribers can generate videos directly from text, while Whisk Animate converts input images into short animated clips. This marks a consumer-facing deployment of Veo 2, DeepMind's second-generation video generation model.
Google DeepMind Introduces Veo 3, Imagen 4, and Flow Filmmaking Tool
Google DeepMind has announced Veo 3 and Imagen 4, new generative video and image models respectively, alongside a filmmaking tool called Flow. The announcement comes from DeepMind's official blog and represents the next generation of their generative media capabilities. These releases expand Google's multimodal generative AI portfolio targeting creative and professional media production use cases.



