Entity · benchmark

VBench

benchmarkactivevbench-6b8692fa·4 events·first seen May 17, 2026

Aliases: VBench

Co-occurring entities

Wan 2.1 MeanFlow SD3.5-M MeanFlowNFT DiffusionNFT Lumos-Nexus VR-Bench Jiazheng Xing Unified Progressive Frequency Bridging NVIDIA B200 KV Cache 3D-RoPE VideoMLA Multi-head Latent Attention (MLA)RefDecoder Inter4K VideoVAE+WebVid reference attention

More like this (12)

MVBench VBench I2V LVBench VerifierBench SpecBench PortBench AdvBench VitaBench VLABench FilBench SpatialBench ProgramBench

Recent events (4)

5arXiv · cs.LG·Jul 17, 2026·source ↗

MeanFlowNFT extends forward-process RL to few-step MeanFlow generators for image and video

MeanFlowNFT is a new RL fine-tuning framework that adapts the DiffusionNFT forward-process RL objective to MeanFlow generators, which use average-velocity predictions for fast few-step sampling. The key contribution is an induced instantaneous-velocity predictor derived from the MeanFlow identity, allowing reward optimization without disrupting MeanFlow's efficient sampling. The method inherits DiffusionNFT's policy-improvement guarantee and outperforms prior RL-tuned few-step generators on SD3.5-M and Wan 2.1, with 4-step MeanFlowNFT surpassing 50-step RL-tuned diffusion on VBench.

Alignment and RLHF Multimodal Progress MeanFlow SD3.5-M VBench +3 more

5arXiv · cs.AI·Jun 1, 2026·source ↗

Lumos-Nexus: Efficient Frequency Bridging for Reasoning-Driven Video Generation

Lumos-Nexus is a training-efficient unified video generation framework that decouples training and inference to achieve high visual fidelity without prohibitive compute costs. During training, a lightweight generator is aligned with an understanding block; at inference, Unified Progressive Frequency Bridging (UPFB) hands off generation to a high-capacity pretrained generator in a shared latent space for coarse-to-fine refinement. The authors also introduce VR-Bench, a new benchmark for evaluating reasoning-driven video generation. Code and models are publicly released.

Evaluation and Benchmarking Inference Economics Lumos-Nexus VBench VR-Bench +3 more

6arXiv · cs.AI·May 29, 2026·source ↗

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

VideoMLA applies Multi-Head Latent Attention (MLA) to causal video diffusion, replacing per-head keys and values with a shared low-rank content latent and decoupled 3D-RoPE positional key, achieving 92.7% reduction in per-token KV memory. The paper investigates why MLA works despite pretrained video attention not being low-rank (unlike the spectral assumption motivating MLA in LLMs), finding that the MLA bottleneck itself determines effective rank rather than the pretrained spectrum. On VBench, VideoMLA matches short-horizon baselines, achieves best overall score at long horizons, and delivers 1.23x throughput improvement on a single NVIDIA B200 GPU.

Training Infrastructure Long Context Evolution NVIDIA B200 KV Cache 3D-RoPE +5 more

5arXiv · cs.LG·May 17, 2026·source ↗

RefDecoder: Reference-Conditioned Video VAE Decoder for Enhanced Visual Generation

RefDecoder addresses an architectural asymmetry in latent diffusion models where denoising networks are heavily conditioned but decoders remain unconditional, causing detail loss and inconsistency. The approach injects high-fidelity reference image signals into the VAE decoding process via reference attention, with a lightweight image encoder mapping reference frames into high-dimensional tokens co-processed at each decoder up-sampling stage. Evaluated on Inter4K, WebVid, and Large Motion benchmarks, RefDecoder achieves up to +2.1dB PSNR over unconditional baselines and improves VBench I2V scores across subject consistency, background consistency, and overall quality. The module is plug-and-play, compatible with existing video generation systems including Wan 2.1 and VideoVAE+ without additional fine-tuning.

Inference Economics Multimodal Progress VBench RefDecoder Inter4K +4 more