Almanac
paper

Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

paperactiveprovisionallook-light-think-heavy-what-multimodal-chain-of-thought-reasoning-can-and-cannot-do-ecfdc8ea·1 events·first seen 2d ago

Aliases: Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·2d ago·source ↗

Systematic evaluation reveals limits of multimodal Chain-of-Thought reasoning across perception and reasoning tasks

A new arXiv paper evaluates multimodal Chain-of-Thought (CoT) reasoning across 12 tasks using 22 models (14 non-reasoning, 8 reasoning), finding that CoT is not universally beneficial: it hurts performance on perception tasks like visual grounding and object counting while helping mathematical and scientific reasoning. The study identifies a 'Look Light, Think Heavy' pattern where visual reflection consistently diminishes during reasoning chains even as verbal reflection fluctuates, pointing to deep visual introspection as a key unsolved bottleneck. Open-source multimodal reasoning models show only marginal overall gains, likely due to overemphasis on mathematical reasoning during training.