dataset

MMBench2

datasetactiveprovisionalmmbench2-c06b538d·1 events·first seen 4d ago

Aliases: MMBench2

Co-occurring entities

Nicklas Hansen Hallucination in World Models is Predictable and Preventable

More like this (12)

MemBench MTBench MT-Bench SorryBench RMCBench ITBench-AA FoldBench RepoBench MLE-bench AdvBench MLE Bench Lite SPBench

Recent events (1)

6arXiv · cs.LG·4d ago·source ↗

MMBench2 paper: hallucination in world models is predictable and preventable via coverage signals

Researchers introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling, and train a 350M-parameter world model to study hallucination in generative world models. The paper identifies three distinct hallucination modes (perceptual, action-marginalized, scene-diverging) and develops lightweight signals that predict where models will fail. A coverage-aware sampling technique and curiosity-reward-based data collection enable efficient finetuning to unseen environments with as few as 50 real trajectories. The central finding is that world model hallucination is fundamentally a data coverage problem, with the same signals serving both detection and mitigation.

Evaluation and Benchmarking Nicklas Hansen MMBench2 Hallucination in World Models is Predictable and Preventable