Almanac
paper

Learning Action Priors for Cross-embodiment Robot Manipulation

paperactiveprovisionallearning-action-priors-for-cross-embodiment-robot-manipulation-7d20f405·1 events·first seen 17h ago

Aliases: Learning Action Priors for Cross-embodiment Robot Manipulation

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·17h ago·source ↗

Two-stage action prior pretraining improves cross-embodiment VLA robot manipulation

Researchers propose a two-stage training framework for Vision-Language-Action (VLA) models that pretrains the action module with motion priors before cross-modal alignment begins. Stage 1 uses a flow-matching-based encoder-decoder to learn temporal motion structure from unconditioned action trajectories alone; Stage 2 transfers this prior to VLA training via decoder reuse and latent distillation. Evaluated across 13 cross-embodiment tasks in simulation and real-world settings, the approach achieves faster convergence, higher success rates, and notably better performance in data-scarce real-world scenarios compared to VLA training without action priors.