SD-Turbo
sd-turbo-377f87f8·1 events·first seen 15d agoAliases: SD-Turbo
Co-occurring entities
More like this (12)
Recent events (1)
Drifting Preference Optimization (DrPO) for One-Step Text-to-Image Generators
DrPO is a new online preference fine-tuning method designed specifically for deterministic one-step text-to-image generators like SD-Turbo and SDXL-Turbo, which are difficult to align with standard RLHF methods that require policy likelihoods or differentiable reward gradients. The method samples candidates per prompt, ranks them with a target reward, and synthesizes a feature-space update direction via a non-parametric dipole preference field plus a reference drift from the frozen base model. Because the reward is used only for ranking, DrPO supports black-box and non-differentiable reward functions while keeping inference as a single forward pass. Evaluations on HPSv3 and GenEval show improved alignment over reward-gradient-free baselines and a 3.51× reduction in training compute by eliminating reward-model backpropagation.