6The Batch (DeepLearning.AI)·19d ago

ByteDance Launches Seedance 2.0 Video Generation Model Globally via CapCut

ByteDance has deployed Seedance 2.0, a multimodal video generation model, to hundreds of millions of CapCut users across multiple global regions. The model supports text, image, audio, and video inputs with synchronized audio-video output, lip-synced dialogue, and camera control via prompts. It ranks within the top two on Arena AI and Artificial Analysis video leaderboards, and is available via API at $0.30 per second of output. The issue also features Andrew Ng's editorial arguing against the 'AI jobpocalypse' narrative, attributing it to incentive structures at labs and companies.

Frontier Model Releases Inference Economics Multimodal Progress Seedance 2.0 Artificial Analysis CapCut ByteDance Dreamina BytePlus OpenAI Sora Andrew Ng Arena AI Volcengine

Related guides (3)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

7The Batch·19d ago·source ↗

ByteDance Deploys Seedance 2.0 Video Model to CapCut's 736M Users as OpenAI Shutters Sora

ByteDance has integrated Seedance 2.0, its multimodal video generation model, into CapCut for paying users across multiple global regions, reaching a platform with approximately 736 million monthly active users. The model supports text, image, audio, and video inputs, generates synchronized audio-video output in a single pass including multi-shot sequences, and ranks in the top two on Arena AI and Artificial Analysis video leaderboards, with Alibaba's HappyHorse-1.0 as its closest competitor. Simultaneously, OpenAI is discontinuing the Sora app and API after daily active users fell below 500,000 and operating costs reached an estimated $1 million per day. The contrast illustrates a broader market shift where Chinese developers are accelerating video model releases while U.S. consumer video products retreat.

Frontier Model Releases Evaluation and Benchmarking Seedance 2.0 Artificial Analysis CapCut +15 more

7The Batch·18d ago·source ↗

Grok Imagine 1.0 Sharply Cuts Costs for High-Quality Video Generation

xAI launched Grok Imagine 1.0, a text-and-image-to-video model that topped the Artificial Analysis Video Arena leaderboard in both text-to-video and image-to-video categories at launch. The model generates up to 15-second clips with audio at $4.20 per minute of output, significantly undercutting Google Veo 3.1 ($12/min) and OpenAI Sora 2 Pro ($30/min). It is integrated with the X social network, enabling direct generation and sharing, though xAI disclosed no technical details about the model's architecture. The launch highlights continued rapid cost compression in video generation, with a seven-fold price gap between Grok Imagine 1.0 and Sora 2 Pro.

Frontier Model Releases Evaluation and Benchmarking Artificial Analysis Grok Imagine Google Veo 3.1 +10 more

4Hugging Face Blog·1mo ago·source ↗

A Dive into Text-to-Video Models

A Hugging Face blog post providing an overview of text-to-video generation models as of mid-2023. The post surveys the landscape of approaches, architectures, and key models in the emerging text-to-video space. As a tier-2 commentary piece, it synthesizes existing work rather than presenting novel research.

Multimodal Progress text-to-video generation Hugging Face

6Google Deepmind Blog·1mo ago·source ↗

Introducing Veo 3.1 and Advanced Creative Capabilities

Google DeepMind has announced Veo 3.1, an updated version of its video generation model, with significant enhancements to creative control features. The announcement comes from DeepMind's official blog, indicating a formal product update rather than a research preview. Specific capability details are not provided in the body text, but the framing suggests improvements to user-facing generation controls.

Frontier Model Releases Multimodal Progress Veo Veo 3.1 Google DeepMind

7Google Deepmind Blog·1mo ago·source ↗

Veo 2 Video Generation Launches in Gemini Advanced and Whisk Animate

Google DeepMind is rolling out Veo 2 video generation capabilities to Gemini Advanced and Whisk, enabling users to create high-resolution eight-second videos from text prompts or animate still images. Gemini Advanced subscribers can generate videos directly from text, while Whisk Animate converts input images into short animated clips. This marks a consumer-facing deployment of Veo 2, DeepMind's second-generation video generation model.

Frontier Model Releases Enterprise Deployment Patterns Gemini Advanced Veo 2 Whisk +3 more

6Google Deepmind Blog·1mo ago·source ↗

Veo 3.1 Ingredients to Video: More consistency, creativity and control

Google DeepMind has released Veo 3.1, an updated video generation model that improves consistency, creativity, and control in generated clips. The update produces more natural and dynamic video content and adds support for vertical video generation. The announcement comes from DeepMind's official blog as a tier-1 source.

Frontier Model Releases Multimodal Progress Veo Veo 3.1 Google DeepMind

4Hugging Face Blog·1mo ago·source ↗

Build Awesome Datasets for Video Generation

Hugging Face published a blog post on constructing high-quality datasets for video generation models. The post likely covers data collection, preprocessing, and curation pipelines relevant to training video diffusion or generation systems. This is a practical tooling and methodology guide aimed at practitioners working on video AI.

Agent and Tool Ecosystem Multimodal Progress Hugging Face video generation

6The Batch·17d ago·source ↗

DeerFlow 2.0 launches as open-source agent harness; Anthropic sues Pentagon over AI blacklist; Google releases Gemini Embedding 2

ByteDance released DeerFlow 2.0, an open-source agent harness built on LangGraph/LangChain that orchestrates parallel sub-agents with sandboxed Docker environments, progressive skill-loading, and persistent memory for complex workflows. Anthropic filed two lawsuits against the U.S. Pentagon contesting a supply-chain risk blacklist tied to its refusal to remove guardrails preventing Claude's use in autonomous weapons and domestic surveillance, with potential multi-billion dollar revenue impact. Google released Gemini Embedding 2, a multimodal embedding model unifying text, images, video, audio, and PDFs in a single vector space, succeeding the text-only predecessor. Meta acquired Moltbook, an agent-to-agent social platform built around the OpenClaw framework, while OpenAI hired OpenClaw's creator and acquired AI security testing platform Promptfoo.

Regulatory Developments Agent and Tool Ecosystem Ben Parr Jeff Dean Gemini Embedding 2 +17 more