benchmark
RoboWits
benchmarkactiveprovisional
robowits-ad5530ab·1 events·first seen 19d agoAliases: RoboWits
Co-occurring entities
More like this (12)
Recent events (1)
RoboWits: Benchmark for Robotic Creative Problem Solving Under Unexpected Conditions
RoboWits is a new bi-manual robotic benchmark designed to evaluate cognitive reasoning, creative tool use, and robustness to unexpected conditions in robotics. The authors introduce an automated multi-agent task generation pipeline that produces 30 seed tasks and 208 mutated tasks spanning geometry, material, and assembly-based reasoning. Benchmarking results show that pre-trained Vision-Language-Action models (VLAs) achieve limited success on seed tasks after fine-tuning but fail on mutated variants, exposing brittleness in reasoning and strategy adaptation. The benchmark highlights a significant gap between skill-level execution and genuine cognitive reasoning in current robotic systems.