Almanac
technique

embodiment-aware prompt conditioning

techniqueactiveprovisionalembodiment-aware-prompt-conditioning-4cd71668·1 events·first seen 18d ago

Aliases: embodiment-aware prompt conditioning

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·18d ago·source ↗

Qwen-VLA: Unified Vision-Language-Action Model Across Robot Tasks, Environments, and Embodiments

Alibaba's Qwen team presents Qwen-VLA, a unified embodied foundation model that extends the Qwen vision-language stack to continuous action and trajectory generation via a DiT-based action decoder. The model is jointly pretrained on diverse data spanning manipulation trajectories, egocentric demonstrations, synthetic simulation, and navigation data, with embodiment-aware prompt conditioning to support multiple robot platforms. A unified action-and-trajectory prediction framework covers manipulation, navigation, and trajectory prediction tasks. Benchmarks show strong results: 97.9% on LIBERO, 73.7% on Simpler-WidowX, 69.0% OSR on R2R navigation, and 76.9% average OOD success in real-world ALOHA experiments.