4GitHub Trending (AI/LLM filtered)·1mo ago

ViMax: Agentic Video Generation System (Director, Screenwriter, Producer, Generator All-in-One)

ViMax is an open-source Python framework from HKUDS that frames video generation as a multi-role agentic pipeline, combining director, screenwriter, producer, and video generator roles into a single system. The project has accumulated 4,524 GitHub stars with 174 added today, indicating significant community traction. It represents an application of agentic AI architectures to the video generation domain.

Agent and Tool Ecosystem Multimodal Progress HKUDS ViMax

Related guides (2)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

4Github Trending·4d ago·source ↗

OpenMontage: open-source agentic video production system with 52 tools and 500+ agent skills

OpenMontage is a newly trending open-source Python project claiming to be the first agentic video production system, offering 12 pipelines, 52 tools, and 500+ agent skills. It is designed to extend AI coding assistants into full video production workflows. The project has accumulated 5,231 GitHub stars with 71 added today, indicating notable community traction.

Agent and Tool Ecosystem Multimodal Progress OpenMontage calesthio

6Google Deepmind Blog·1mo ago·source ↗

Veo 3.1 Ingredients to Video: More consistency, creativity and control

Google DeepMind has released Veo 3.1, an updated video generation model that improves consistency, creativity, and control in generated clips. The update produces more natural and dynamic video content and adds support for vertical video generation. The announcement comes from DeepMind's official blog as a tier-1 source.

Frontier Model Releases Multimodal Progress Veo Veo 3.1 Google DeepMind

5Latent Space·20d ago·source ↗

Why Video Agent Models Are Next — Ethan He, xAI Grok Imagine

Latent Space interviews Ethan He, the lead behind xAI's Grok Imagine video generation product, covering its development in roughly three months. The discussion explores the distinction between video generation models and world models, and positions video agents as a significant near-term frontier. He argues Grok Imagine is underrated relative to its capabilities.

Frontier Model Releases Agent and Tool Ecosystem Grok Imagine world model video agents +4 more

5Github Trending·1mo ago·source ↗

HeyGen Hyperframes: HTML-to-Video Rendering Library Built for Agents

HeyGen has open-sourced Hyperframes, a TypeScript library that converts HTML into rendered video output, explicitly designed for use by AI agents. The project has accumulated 19,600 GitHub stars with 351 added today, indicating significant community interest. This positions HeyGen's video generation capabilities as a programmatic, agent-accessible tool rather than a purely human-facing product.

Agent and Tool Ecosystem Multimodal Progress TypeScript Hyperframes HeyGen

6Google Deepmind Blog·1mo ago·source ↗

Introducing Veo 3.1 and Advanced Creative Capabilities

Google DeepMind has announced Veo 3.1, an updated version of its video generation model, with significant enhancements to creative control features. The announcement comes from DeepMind's official blog, indicating a formal product update rather than a research preview. Specific capability details are not provided in the body text, but the framing suggests improvements to user-facing generation controls.

Frontier Model Releases Multimodal Progress Veo Veo 3.1 Google DeepMind

7Google Deepmind Blog·1mo ago·source ↗

Veo 2 Video Generation Launches in Gemini Advanced and Whisk Animate

Google DeepMind is rolling out Veo 2 video generation capabilities to Gemini Advanced and Whisk, enabling users to create high-resolution eight-second videos from text prompts or animate still images. Gemini Advanced subscribers can generate videos directly from text, while Whisk Animate converts input images into short animated clips. This marks a consumer-facing deployment of Veo 2, DeepMind's second-generation video generation model.

Frontier Model Releases Enterprise Deployment Patterns Gemini Advanced Veo 2 Whisk +3 more

7Qwen Research·1mo ago·source ↗

QVQ-Max: Alibaba Qwen Releases Visual Reasoning Model with Multimodal Chain-of-Thought

Alibaba's Qwen team has officially released QVQ-Max, a visual reasoning model succeeding the December 2024 QVQ-72B-Preview. The model is designed to analyze and reason over images and videos, covering domains including mathematics, programming, and creative tasks. It represents a step beyond the exploratory preview, positioning as a production-grade multimodal reasoning system.

Frontier Model Releases Agent and Tool Ecosystem Alibaba Qwen QVQ-72B-Preview QVQ-Max +1 more

7The Batch·19d ago·source ↗

Grok Imagine 1.0 Sharply Cuts Costs for High-Quality Video Generation

xAI launched Grok Imagine 1.0, a text-and-image-to-video model that topped the Artificial Analysis Video Arena leaderboard in both text-to-video and image-to-video categories at launch. The model generates up to 15-second clips with audio at $4.20 per minute of output, significantly undercutting Google Veo 3.1 ($12/min) and OpenAI Sora 2 Pro ($30/min). It is integrated with the X social network, enabling direct generation and sharing, though xAI disclosed no technical details about the model's architecture. The launch highlights continued rapid cost compression in video generation, with a seven-fold price gap between Grok Imagine 1.0 and Sora 2 Pro.

Frontier Model Releases Evaluation and Benchmarking Artificial Analysis Grok Imagine Google Veo 3.1 +10 more