Hierarchical Advantage-Weighted Behavior Cloning
hierarchical-advantage-weighted-behavior-cloning-c1ed5718·1 events·first seen 31h agoAliases: Hierarchical Advantage-Weighted Behavior Cloning
Co-occurring entities
More like this (12)
Recent events (1)
HABC: Hierarchical Advantage Weighting for Online RL Fine-Tuning of Vision-Language-Action Policies
Researchers introduce Hierarchical Advantage-Weighted Behavior Cloning (HABC), a method for fine-tuning pretrained Vision-Language-Action (VLA) policies via online RL using only sparse binary episode outcomes. HABC trains separate critic heads for viability and efficiency objectives, combines them via a state-adaptive gate, and applies intervention-aware credit assignment to avoid incorrect supervision across human-intervention boundaries. On three contact-rich bimanual real-robot tasks, HABC improves success rates from SFT baselines of 36%, 44%, and 12% to 92%, 88%, and 38% respectively. The work addresses a fundamental credit assignment problem in robot learning from sparse outcome signals.