Entity · paper

AHA-WAM

paperactiveaha-wam-c5e05d10·1 events·first seen Jun 9, 2026

Aliases: AHA-WAM

Co-occurring entities

RoboTwin Linear Diffusion Transformer Observation-Guided Video-Context Routing Fast-WAM

More like this (12)

BadWAM Fast-WAM DAAM AAAI AMIA AIEWF AWQ ASAM AMRS AASIST AuRA DAIC-WOZ

Recent events (1)

6arXiv · cs.AI·Jun 9, 2026·source ↗

AHA-WAM: Asynchronous world-action modeling with temporal decoupling for robot manipulation

AHA-WAM introduces a dual Diffusion Transformer architecture that decouples world prediction (low-frequency) from action execution (high-frequency) in robot manipulation policies, addressing the inefficiency of existing world-action models that force both branches to operate at the same temporal resolution. The system uses a rolling key-value memory video DiT as a long-horizon scene planner and a fast action DiT that queries layerwise latent context via joint attention, with Observation-Guided Video-Context Routing enabling asynchronous execution. On RoboTwin benchmarks, AHA-WAM achieves 92.80% average success and 78.3% on real-world tasks at 24.17 Hz, a 4.59x speedup over Fast-WAM, without robot-data pretraining.

Inference Economics RoboTwin Linear Diffusion Transformer Observation-Guided Video-Context Routing +2 more