behavior cloning
behavior-cloning-4ee5f42c·1 events·first seen 20d agoAliases: behavior cloning
Co-occurring entities
More like this (12)
Recent events (1)
AMRS: Rollout-Based World Model for Offline Affective Music Recommendation with DPO
LUCID's Affective Music Recommendation System (AMRS) uses a causal transformer world model trained on logged listening data to jointly predict engagement, ratings, and self-reported valence/arousal, enabling offline policy optimization without ethically problematic online experimentation. A recommender policy is initialized via behavior cloning and fine-tuned with Direct Preference Optimization (DPO) against a multi-objective utility function. The system is deployed on LUCID's health-and-wellness platforms serving clinical users (older adults with neurocognitive conditions) and consumer-wellness users across four modes. Under cold-start conditions, DPO improves predicted affective signals over the cloned baseline while maintaining diversity and avoiding distributional collapse.