Almanac
technique

IV-CoT

techniqueactiveprovisionaliv-cot-d8200aed·1 events·first seen 44h ago

Aliases: IV-CoT

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.AI·44h ago·source ↗

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Researchers propose Implicit Visual Chain-of-Thought (IV-CoT), a latent visual reasoning framework that decomposes visual conditioning queries into a structural-to-semantic cascade for text-to-image generation. The method uses training-only sketch supervision to guide structural queries without requiring sketch extraction at inference time, enabling implicit CoT reasoning in a single forward pass. IV-CoT achieves improved results on GenEval and T2I-CompBench benchmarks, targeting persistent weaknesses in multimodal LLMs around object counts, spatial relations, and attribute binding.