Almanac
benchmark

RoboWits

benchmarkactiveprovisionalrobowits-ad5530ab·1 events·first seen 19d ago

Aliases: RoboWits

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·19d ago·source ↗

RoboWits: Benchmark for Robotic Creative Problem Solving Under Unexpected Conditions

RoboWits is a new bi-manual robotic benchmark designed to evaluate cognitive reasoning, creative tool use, and robustness to unexpected conditions in robotics. The authors introduce an automated multi-agent task generation pipeline that produces 30 seed tasks and 208 mutated tasks spanning geometry, material, and assembly-based reasoning. Benchmarking results show that pre-trained Vision-Language-Action models (VLAs) achieve limited success on seed tasks after fine-tuning but fail on mutated variants, exposing brittleness in reasoning and strategy adaptation. The benchmark highlights a significant gap between skill-level execution and genuine cognitive reasoning in current robotic systems.