paper
FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation
paperactiveprovisional
force-efficient-vla-reinforcement-fine-tuning-via-value-calibrated-warm-up-and-self-distillation-53961590·1 events·first seen 19h agoAliases: FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation
Co-occurring entities
More like this (12)
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomesimportance-weighted supervised fine-tuningParameter-Efficient Fine-Tuningreinforcement fine-tuningsupervised fine-tuningLearning from the Self-future: On-policy Self-distillation for dLLMsWhen Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New TasksEntropy-Regularized Reinforcement LearningA Unifying Lens on Supervised Fine-Tuning Through Target Distribution DesignInSight: Self-Guided Skill Acquisition via Steerable VLAsInSight: Self-Guided Skill Acquisition via Steerable VLAsGradient-Guided Reward Optimization
Recent events (1)
FORCE: Efficient RL fine-tuning for Vision-Language-Action models via value-calibrated warm-up and self-distillation
Researchers introduce FORCE, a 3-stage reinforcement learning fine-tuning framework for Vision-Language-Action (VLA) models that addresses sample inefficiency caused by unstable Q-functions and low-quality exploration data. The framework uses a Value-Calibrated Warm-Up phase followed by Q-function-filtered policy updates, eliminating the need for costly human interventions during training. Evaluated on simulation and real-world robotic tasks, FORCE achieves a 79% absolute improvement in task success rates, outperforms prior RL methods by 10%, and accelerates training by 32.5%.