Entity · technique

Thinking-with-Images

techniqueactivethinking-with-images-5d2a7e39·1 events·first seen May 19, 2026

Aliases: Thinking-with-Images

Co-occurring entities

Multimodal Large Language Models on-policy self-distillation Vision-OPD regional-to-global perception gap

More like this (12)

Deep Think Chain-of-Thought Reasoning Tool-Integrated Reasoning Connecting Speech to Words through Images extended thinking Thinking Machines information-driven imaging framework Program-of-Thought Qwen-Image Thinking Machines Interaction Model ImageNet ParaThinker

Recent events (1)

6arXiv · cs.CL·May 19, 2026·source ↗

Vision-OPD: On-Policy Self-Distillation for Fine-Grained Visual Understanding in MLLMs

Vision-OPD addresses a 'regional-to-global perception gap' in multimodal LLMs, where models answer fine-grained visual questions more accurately when given cropped evidence regions than full images. The method instantiates a crop-conditioned teacher and full-image-conditioned student from the same MLLM, minimizing token-level divergence along on-policy rollouts to transfer regional perception to the full-image policy. This self-distillation requires no external teacher models, ground-truth labels, reward verifiers, or inference-time tools. Benchmarks show competitive or superior performance against larger open-source, closed-source, and agentic 'Thinking-with-Images' models.

Evaluation and Benchmarking Agent and Tool Ecosystem Multimodal Large Language Models Thinking-with-Images on-policy self-distillation +4 more