Almanac
benchmark

WikiVQABench

benchmarkactivewikivqabench-5c96c89c·1 events·first seen 26d ago

Aliases: WikiVQABench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·26d ago·source ↗

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

WikiVQABench is a new human-curated VQA benchmark that requires external knowledge beyond visual perception, constructed by combining Wikipedia images, captions, and Wikidata structured knowledge with LLM-generated question candidates reviewed by human annotators. The benchmark evaluates knowledge-intensive reasoning in vision-language models, covering 15 VLMs ranging from 256M to 90B parameters. Accuracy spans 24.7% to 75.6%, indicating meaningful discrimination across model scales. The dataset and code are publicly released.