Almanac
product

Gazer

productactiveprovisionalgazer-463e44d4·1 events·first seen 42h ago

Aliases: Gazer

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·42h ago·source ↗

Gazer: Training-free semantic correction for autoregressive visual models using MLLM feedback

Researchers introduce Gazer, a training-free framework that integrates multimodal large language model feedback into the sampling loop of autoregressive visual models (AVMs) to correct semantic errors during generation. The system operates in two stages: Reflective Diagnosis identifies semantic errors in intermediate generation states, and Semantic Correction rewinds and adjusts the generation trajectory to better match the target prompt. Experiments on compositional image and video benchmarks show improved semantic alignment and compositional accuracy across multiple AVMs without additional training. The work addresses a known weakness of next-scale prediction AVMs, where semantic errors accumulate across discrete generation scales.