Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do
look-light-think-heavy-what-multimodal-chain-of-thought-reasoning-can-and-cannot-do-ecfdc8ea·1 events·first seen 2d agoAliases: Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do
Co-occurring entities
More like this (12)
Recent events (1)
Systematic evaluation reveals limits of multimodal Chain-of-Thought reasoning across perception and reasoning tasks
A new arXiv paper evaluates multimodal Chain-of-Thought (CoT) reasoning across 12 tasks using 22 models (14 non-reasoning, 8 reasoning), finding that CoT is not universally beneficial: it hurts performance on perception tasks like visual grounding and object counting while helping mathematical and scientific reasoning. The study identifies a 'Look Light, Think Heavy' pattern where visual reflection consistently diminishes during reasoning chains even as verbal reflection fluctuates, pointing to deep visual introspection as a key unsolved bottleneck. Open-source multimodal reasoning models show only marginal overall gains, likely due to overemphasis on mathematical reasoning during training.