Entity · benchmark

What'sUp benchmark

benchmarkactivewhat-sup-benchmark-817637a6·1 events·first seen May 25, 2026

Aliases: What'sUp benchmark

Co-occurring entities

PGT (Procedurally Generated Tasks)Multimodal Large Language Models CV-Bench-2D LLaVA-v1.5-Instruct

More like this (12)

CORE benchmark WSADBench SPOT benchmark web navigation benchmark Super-Agent benchmark DPG Benchmark DevDataBench Auto Benchmark Audit (ABA)MATH benchmark ProgramBench PowerCodeBench RepoBench

Recent events (1)

6arXiv · cs.AI·May 25, 2026·source ↗

PGT: Procedurally Generated Tasks for Improving Visual Grounding in MLLMs

This paper introduces Procedurally Generated Tasks (PGT), a data-driven framework that overlays geometric primitives on images to create dense supervision signals for fine-grained visual grounding in multimodal large language models. PGT serves both as a training augmentation method and a diagnostic tool to isolate perception failures from semantic priors. Instruction tuning on LLaVA-v1.5-Instruct augmented with PGT data yields gains of up to +20% on the What'sUp benchmark and +13.3% on CV-Bench-2D. The results suggest that spatial reasoning deficits in MLLMs stem primarily from inadequate supervision rather than architectural or resolution constraints.

Evaluation and Benchmarking Multimodal Progress PGT (Procedurally Generated Tasks)Multimodal Large Language Models CV-Bench-2D +2 more