Almanac
technique

Hierarchical Advantage-Weighted Behavior Cloning

techniqueactiveprovisionalhierarchical-advantage-weighted-behavior-cloning-c1ed5718·1 events·first seen 31h ago

Aliases: Hierarchical Advantage-Weighted Behavior Cloning

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·31h ago·source ↗

HABC: Hierarchical Advantage Weighting for Online RL Fine-Tuning of Vision-Language-Action Policies

Researchers introduce Hierarchical Advantage-Weighted Behavior Cloning (HABC), a method for fine-tuning pretrained Vision-Language-Action (VLA) policies via online RL using only sparse binary episode outcomes. HABC trains separate critic heads for viability and efficiency objectives, combines them via a state-adaptive gate, and applies intervention-aware credit assignment to avoid incorrect supervision across human-intervention boundaries. On three contact-rich bimanual real-robot tasks, HABC improves success rates from SFT baselines of 36%, 44%, and 12% to 92%, 88%, and 38% respectively. The work addresses a fundamental credit assignment problem in robot learning from sparse outcome signals.