Almanac
benchmark

RoboTwin

benchmarkactiveprovisionalrobotwin-1b2fab3b·2 events·first seen 18d ago

Aliases: RoboTwin

Co-occurring entities

More like this (12)

Recent events (2)

6arXiv · cs.AI·8d ago·source ↗

AHA-WAM: Asynchronous world-action modeling with temporal decoupling for robot manipulation

AHA-WAM introduces a dual Diffusion Transformer architecture that decouples world prediction (low-frequency) from action execution (high-frequency) in robot manipulation policies, addressing the inefficiency of existing world-action models that force both branches to operate at the same temporal resolution. The system uses a rolling key-value memory video DiT as a long-horizon scene planner and a fast action DiT that queries layerwise latent context via joint attention, with Observation-Guided Video-Context Routing enabling asynchronous execution. On RoboTwin benchmarks, AHA-WAM achieves 92.80% average success and 78.3% on real-world tasks at 24.17 Hz, a 4.59x speedup over Fast-WAM, without robot-data pretraining.

7arXiv · cs.CL·18d ago·source ↗

Qwen-VLA: Unified Vision-Language-Action Model Across Robot Tasks, Environments, and Embodiments

Alibaba's Qwen team presents Qwen-VLA, a unified embodied foundation model that extends the Qwen vision-language stack to continuous action and trajectory generation via a DiT-based action decoder. The model is jointly pretrained on diverse data spanning manipulation trajectories, egocentric demonstrations, synthetic simulation, and navigation data, with embodiment-aware prompt conditioning to support multiple robot platforms. A unified action-and-trajectory prediction framework covers manipulation, navigation, and trajectory prediction tasks. Benchmarks show strong results: 97.9% on LIBERO, 73.7% on Simpler-WidowX, 69.0% OSR on R2R navigation, and 76.9% average OOD success in real-world ALOHA experiments.