Almanac
paper

FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

paperactiveprovisionalforce-efficient-vla-reinforcement-fine-tuning-via-value-calibrated-warm-up-and-self-distillation-53961590·1 events·first seen 19h ago

Aliases: FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·19h ago·source ↗

FORCE: Efficient RL fine-tuning for Vision-Language-Action models via value-calibrated warm-up and self-distillation

Researchers introduce FORCE, a 3-stage reinforcement learning fine-tuning framework for Vision-Language-Action (VLA) models that addresses sample inefficiency caused by unstable Q-functions and low-quality exploration data. The framework uses a Value-Calibrated Warm-Up phase followed by Q-function-filtered policy updates, eliminating the need for costly human interventions during training. Evaluated on simulation and real-world robotic tasks, FORCE achieves a 79% absolute improvement in task success rates, outperforms prior RL methods by 10%, and accelerates training by 32.5%.