paper

Learning Action Priors for Cross-embodiment Robot Manipulation

paperactiveprovisionallearning-action-priors-for-cross-embodiment-robot-manipulation-7d20f405·1 events·first seen 17h ago

Aliases: Learning Action Priors for Cross-embodiment Robot Manipulation

Co-occurring entities

Vision-Language-Action model Flow Matching

More like this (12)

Geometric Action Model for Robot Policy Learning bi-manual robotic manipulation Operator Learning Open X-Embodiment Pose-ICL: 3D-Aware In-Context Learning for Pose-Controllable Subject Customization OpenAI Dexterous Hand Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents Modal-Aware Rotary Positional Embedding A Taxonomy of Conceptual Alignment in Human-Robot Dialogue Embodied Minds Lab embodied agents UMass Embodied AGI

Recent events (1)

5arXiv · cs.AI·17h ago·source ↗

Two-stage action prior pretraining improves cross-embodiment VLA robot manipulation

Researchers propose a two-stage training framework for Vision-Language-Action (VLA) models that pretrains the action module with motion priors before cross-modal alignment begins. Stage 1 uses a flow-matching-based encoder-decoder to learn temporal motion structure from unconditioned action trajectories alone; Stage 2 transfers this prior to VLA training via decoder reuse and latent distillation. Evaluated across 13 cross-embodiment tasks in simulation and real-world settings, the approach achieves faster convergence, higher success rates, and notably better performance in data-scarce real-world scenarios compared to VLA training without action priors.

Agent and Tool Ecosystem Multimodal Progress Learning Action Priors for Cross-embodiment Robot Manipulation Vision-Language-Action model Flow Matching