Entity · paper

Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

paperactive

where-does-the-answer-come-from-benchmarking-view-level-visual-evidence-identification-in-multi-view-mllms-for-autonomous-driving-48d46da1

·1 events·first seen Jun 9, 2026

Aliases: Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

Co-occurring entities

NuScenes

More like this (12)

Watch, Remember, Reason: Human-View Video Understanding with MLLMs Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models Visual Question Answering Bias Benchmark for Question Answering Beyond the Leaderboard: Design Lessons for Trustworthy Multimodal VQA Evidence Attribution in Visual Document Understanding without Coordinates or Region Labels Evidence-Backed Video Question Answering AIR: Adaptive Interleaved Reasoning with Code in MLLMs When Model Merging Rivals Joint Multi-Task Reinforcement Learning: A Task-Vector Geometry Analysis Document Visual Question Answering Towards Root Memories: Benchmarking and Enhancing Implicit Logical Memory Retrieval for Personalized LLMs

Recent events (1)

5arXiv · cs.CL·Jun 9, 2026·source ↗

Benchmark for view-level visual evidence identification in multi-view MLLMs for autonomous driving

A new arXiv preprint introduces a multi-view visual question answering benchmark targeting evidence-source identification in autonomous driving scenarios. Given six synchronized NuScenes camera views and a question, models must identify which camera view supports the answer — not just produce a correct answer. The 122-pair benchmark spans causality, counterfactual reasoning, and intent prediction, and exposes grounding failures that answer-only evaluation misses. The work addresses a meaningful gap between answer accuracy and correct visual grounding in safety-critical multimodal systems.

Evaluation and Benchmarking Multimodal Progress NuScenes Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving