Almanac
benchmark

TriViewBench

benchmarkactiveprovisionaltriviewbench-9eb25594·1 events·first seen 22h ago

Aliases: TriViewBench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·22h ago·source ↗

TriViewBench: Controlled benchmark reveals fundamental multi-view spatial reasoning failures in MLLMs

Researchers introduce TriViewBench, a synthetic 3D benchmark of 1,923 scenes and 14K+ QA pairs designed to probe multi-view structural reasoning in MLLMs under controlled complexity scaling. Evaluating 18 open- and closed-source models, the study finds a universal capability hierarchy (Local Decision > Object Counting > Global Recovery) with severe performance collapse on Global Recovery tasks (80% relative drop at highest complexity). Chain-of-Thought prompting provides near-zero benefit, suggesting the bottleneck is cross-view spatial representation rather than reasoning strategy. The work identifies two mechanistically distinct failure modes in object counting: occlusion blindness causing undercounting in single-view tasks and cross-view identity confusion causing overcounting in multi-view tasks.