Vision-Language-Action models
vision-language-action-models-d6ef443b·3 events·first seen 19d agoAliases: Vision-Language-Action models
Co-occurring entities
More like this (12)
Recent events (3)
RoboWits: Benchmark for Robotic Creative Problem Solving Under Unexpected Conditions
RoboWits is a new bi-manual robotic benchmark designed to evaluate cognitive reasoning, creative tool use, and robustness to unexpected conditions in robotics. The authors introduce an automated multi-agent task generation pipeline that produces 30 seed tasks and 208 mutated tasks spanning geometry, material, and assembly-based reasoning. Benchmarking results show that pre-trained Vision-Language-Action models (VLAs) achieve limited success on seed tasks after fine-tuning but fail on mutated variants, exposing brittleness in reasoning and strategy adaptation. The benchmark highlights a significant gap between skill-level execution and genuine cognitive reasoning in current robotic systems.
DynaFLIP: Dynamics-Aware Multimodal Pre-Training for Robot Manipulation Perception
DynaFLIP is a pre-training framework that injects motion understanding into visual encoders for robot manipulation by constructing image-language-3D flow triplets from human and robot videos. The method encourages tri-modal alignment via simplex-volume minimization in a shared hyperspherical space, combined with cosine regularization and contrastive objectives. The resulting dynamics-aware visual backbone consistently outperforms baselines across diverse downstream policies including VLAs, with gains up to +22.5% in out-of-distribution scenarios. The work argues that robot generalization requires encoding how the world changes under action, not just static scene content.
Mistral AI Announces Strategic Partnerships with SAP and Helsing for German/European AI Sovereignty
Mistral AI has announced a multiyear partnership with SAP to deliver a sovereign AI stack for Germany and Europe, integrating Mistral models into SAP's AI Foundation and co-developing industry-specific solutions. Separately, Mistral is partnering with defense-AI firm Helsing to develop vision-language-action models for defense and security applications. The company is also expanding its physical presence in Germany with a new office and increased local headcount, framing these moves as part of a broader commitment to European AI autonomy.