ByteDance Launches Seedance 2.0 Video Generation Model Globally via CapCut
ByteDance has deployed Seedance 2.0, a multimodal video generation model, to hundreds of millions of CapCut users across multiple global regions. The model supports text, image, audio, and video inputs with synchronized audio-video output, lip-synced dialogue, and camera control via prompts. It ranks within the top two on Arena AI and Artificial Analysis video leaderboards, and is available via API at $0.30 per second of output. The issue also features Andrew Ng's editorial arguing against the 'AI jobpocalypse' narrative, attributing it to incentive structures at labs and companies.
Related guides (3)
Related events (8)
ByteDance Deploys Seedance 2.0 Video Model to CapCut's 736M Users as OpenAI Shutters Sora
ByteDance has integrated Seedance 2.0, its multimodal video generation model, into CapCut for paying users across multiple global regions, reaching a platform with approximately 736 million monthly active users. The model supports text, image, audio, and video inputs, generates synchronized audio-video output in a single pass including multi-shot sequences, and ranks in the top two on Arena AI and Artificial Analysis video leaderboards, with Alibaba's HappyHorse-1.0 as its closest competitor. Simultaneously, OpenAI is discontinuing the Sora app and API after daily active users fell below 500,000 and operating costs reached an estimated $1 million per day. The contrast illustrates a broader market shift where Chinese developers are accelerating video model releases while U.S. consumer video products retreat.
Grok Imagine 1.0 Sharply Cuts Costs for High-Quality Video Generation
xAI launched Grok Imagine 1.0, a text-and-image-to-video model that topped the Artificial Analysis Video Arena leaderboard in both text-to-video and image-to-video categories at launch. The model generates up to 15-second clips with audio at $4.20 per minute of output, significantly undercutting Google Veo 3.1 ($12/min) and OpenAI Sora 2 Pro ($30/min). It is integrated with the X social network, enabling direct generation and sharing, though xAI disclosed no technical details about the model's architecture. The launch highlights continued rapid cost compression in video generation, with a seven-fold price gap between Grok Imagine 1.0 and Sora 2 Pro.
A Dive into Text-to-Video Models
A Hugging Face blog post providing an overview of text-to-video generation models as of mid-2023. The post surveys the landscape of approaches, architectures, and key models in the emerging text-to-video space. As a tier-2 commentary piece, it synthesizes existing work rather than presenting novel research.
Introducing Veo 3.1 and Advanced Creative Capabilities
Google DeepMind has announced Veo 3.1, an updated version of its video generation model, with significant enhancements to creative control features. The announcement comes from DeepMind's official blog, indicating a formal product update rather than a research preview. Specific capability details are not provided in the body text, but the framing suggests improvements to user-facing generation controls.
Veo 2 Video Generation Launches in Gemini Advanced and Whisk Animate
Google DeepMind is rolling out Veo 2 video generation capabilities to Gemini Advanced and Whisk, enabling users to create high-resolution eight-second videos from text prompts or animate still images. Gemini Advanced subscribers can generate videos directly from text, while Whisk Animate converts input images into short animated clips. This marks a consumer-facing deployment of Veo 2, DeepMind's second-generation video generation model.
Veo 3.1 Ingredients to Video: More consistency, creativity and control
Google DeepMind has released Veo 3.1, an updated video generation model that improves consistency, creativity, and control in generated clips. The update produces more natural and dynamic video content and adds support for vertical video generation. The announcement comes from DeepMind's official blog as a tier-1 source.
Build Awesome Datasets for Video Generation
Hugging Face published a blog post on constructing high-quality datasets for video generation models. The post likely covers data collection, preprocessing, and curation pipelines relevant to training video diffusion or generation systems. This is a practical tooling and methodology guide aimed at practitioners working on video AI.
DeerFlow 2.0 launches as open-source agent harness; Anthropic sues Pentagon over AI blacklist; Google releases Gemini Embedding 2
ByteDance released DeerFlow 2.0, an open-source agent harness built on LangGraph/LangChain that orchestrates parallel sub-agents with sandboxed Docker environments, progressive skill-loading, and persistent memory for complex workflows. Anthropic filed two lawsuits against the U.S. Pentagon contesting a supply-chain risk blacklist tied to its refusal to remove guardrails preventing Claude's use in autonomous weapons and domestic surveillance, with potential multi-billion dollar revenue impact. Google released Gemini Embedding 2, a multimodal embedding model unifying text, images, video, audio, and PDFs in a single vector space, succeeding the text-only predecessor. Meta acquired Moltbook, an agent-to-agent social platform built around the OpenClaw framework, while OpenAI hired OpenClaw's creator and acquired AI security testing platform Promptfoo.


