paper

FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

paperactiveprovisionalforce-efficient-vla-reinforcement-fine-tuning-via-value-calibrated-warm-up-and-self-distillation-53961590·1 events·first seen 19h ago

Aliases: FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

Co-occurring entities

FORCE

More like this (12)

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes importance-weighted supervised fine-tuning Parameter-Efficient Fine-Tuning reinforcement fine-tuning supervised fine-tuning Learning from the Self-future: On-policy Self-distillation for dLLMs When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks Entropy-Regularized Reinforcement Learning A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design InSight: Self-Guided Skill Acquisition via Steerable VLAs InSight: Self-Guided Skill Acquisition via Steerable VLAs Gradient-Guided Reward Optimization

Recent events (1)

5arXiv · cs.AI·19h ago·source ↗

FORCE: Efficient RL fine-tuning for Vision-Language-Action models via value-calibrated warm-up and self-distillation

Researchers introduce FORCE, a 3-stage reinforcement learning fine-tuning framework for Vision-Language-Action (VLA) models that addresses sample inefficiency caused by unstable Q-functions and low-quality exploration data. The framework uses a Value-Calibrated Warm-Up phase followed by Q-function-filtered policy updates, eliminating the need for costly human interventions during training. Evaluated on simulation and real-world robotic tasks, FORCE achieves a 79% absolute improvement in task success rates, outperforms prior RL methods by 10%, and accelerates training by 32.5%.

Agent and Tool Ecosystem Alignment and RLHF FORCE FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation