Almanac
paper

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

paperactiveprovisionaldoes-vla-even-know-the-basics-measuring-commonsense-and-world-knowledge-retention-in-vision-language-action-models-ae984616·1 events·first seen 3d ago

Aliases: Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·3d ago·source ↗

Act2Answer: Benchmarking commonsense and world knowledge retention in Vision-Language-Action models

Researchers introduce Act2Answer, a protocol for evaluating how much commonsense and factual knowledge VLA models retain after fine-tuning on robotics data. The approach converts knowledge benchmark questions into tabletop object-placement episodes, yielding action-grounded success rates that reduce confounds from low-level control failures. A large-scale study of 7 VLA models and 9 VLM baselines finds that VLAs retain solid performance on simple concepts but show larger gaps on richer semantic categories compared to their source VLMs, and that VQA co-training is associated with better knowledge retention.