product

Gazer

productactiveprovisionalgazer-463e44d4·1 events·first seen 42h ago

Aliases: Gazer

Co-occurring entities

Training-Free Semantic Correction for Autoregressive Visual Models

More like this (12)

GASING MambaGaze Gaze Heads: How VLMs Look at What They Describe Gaze Heads: How VLMs Look at What They Describe STARE Social Gaze Consistency Deep Eye BLINK Veo Glow Gatsby Golem

Recent events (1)

5arXiv · cs.CL·42h ago·source ↗

Gazer: Training-free semantic correction for autoregressive visual models using MLLM feedback

Researchers introduce Gazer, a training-free framework that integrates multimodal large language model feedback into the sampling loop of autoregressive visual models (AVMs) to correct semantic errors during generation. The system operates in two stages: Reflective Diagnosis identifies semantic errors in intermediate generation states, and Semantic Correction rewinds and adjusts the generation trajectory to better match the target prompt. Experiments on compositional image and video benchmarks show improved semantic alignment and compositional accuracy across multiple AVMs without additional training. The work addresses a known weakness of next-scale prediction AVMs, where semantic errors accumulate across discrete generation scales.

Evaluation and Benchmarking Multimodal Progress Gazer Training-Free Semantic Correction for Autoregressive Visual Models