Entity · benchmark

RoboWits

benchmarkactiverobowits-ad5530ab·1 events·first seen May 29, 2026

Aliases: RoboWits

Co-occurring entities

Vision-Language-Action models multi-agent cooperative framework UMass Embodied AGI bi-manual robotic manipulation

More like this (12)

RoboTwin RoboGenesis RoboTTT RoboTTT RoboReward RoboMME RoboTHOR nanobot Agility Robotics WrenAI Wiz Research Robin AI

Recent events (1)

5arXiv · cs.AI·May 29, 2026·source ↗

RoboWits: Benchmark for Robotic Creative Problem Solving Under Unexpected Conditions

RoboWits is a new bi-manual robotic benchmark designed to evaluate cognitive reasoning, creative tool use, and robustness to unexpected conditions in robotics. The authors introduce an automated multi-agent task generation pipeline that produces 30 seed tasks and 208 mutated tasks spanning geometry, material, and assembly-based reasoning. Benchmarking results show that pre-trained Vision-Language-Action models (VLAs) achieve limited success on seed tasks after fine-tuning but fail on mutated variants, exposing brittleness in reasoning and strategy adaptation. The benchmark highlights a significant gap between skill-level execution and genuine cognitive reasoning in current robotic systems.

Evaluation and Benchmarking Agent and Tool Ecosystem Vision-Language-Action models RoboWits multi-agent cooperative framework +3 more