Almanac
technique

predictive visual code

techniqueactiveprovisionalpredictive-visual-code-83312e7e·1 events·first seen 15d ago

Aliases: predictive visual code

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·15d ago·source ↗

AdaCodec: Predictive Visual Coding for Efficient Video MLLMs

AdaCodec introduces a predictive visual code interface for video multimodal large language models that exploits temporal redundancy in video. Instead of encoding every sampled frame as an independent RGB image, it sends full visual tokens only for reference frames with high conditional predictive cost, and encodes inter-frame changes as compact P-tokens. Evaluated against a Qwen3-VL-8B per-frame baseline across eleven benchmarks, AdaCodec at 1/7 the token budget (32k vs 224k tokens) surpasses the baseline on all long-video benchmarks while reducing time-to-first-token from 9.26s to 1.62s.