Entity · benchmark

HPSv3

benchmarkactivehpsv3-041b3911·2 events·first seen Jun 2, 2026

Aliases: HPSv3

Co-occurring entities

Flow-GRPO HPDv2 HPDv3 Stable Diffusion 3.5 Large DiT-Reward SDXL Turbo GenEval SD-Turbo Direct Preference Optimization (DPO)Drifting Preference Optimization

More like this (12)

HPDv3 HPDv2 HPRO TimePNS HP PP-OCRv6 CH-SIMS HM3D SPDNet Hy3 NVFP4 MPI3D

Recent events (2)

6arXiv · cs.AI·Jun 23, 2026·source ↗

DiT-Reward converts text-to-image diffusion transformers into reward models, outperforming HPSv3

DiT-Reward is a new reward modeling approach that repurposes pretrained text-to-image Diffusion Transformers (DiTs) by processing near-clean image latents and aggregating text-conditioned representations across transformer layers. Under matched training data, it outperforms HPSv3 on all four evaluated preference benchmarks, reaching 85.6% on HPDv2 and 77.6% on HPDv3. When used to optimize Stable Diffusion 3.5 Large via Flow-GRPO, it shows clear gains in realism and achieves a 1.65x inference speedup over HPSv3. The work demonstrates that generative DiT representations transfer meaningfully to reward modeling and policy optimization.

Evaluation and Benchmarking Alignment and RLHF Flow-GRPO HPSv3 HPDv2 +4 more

6arXiv · cs.LG·Jun 2, 2026·source ↗

Drifting Preference Optimization (DrPO) for One-Step Text-to-Image Generators

DrPO is a new online preference fine-tuning method designed specifically for deterministic one-step text-to-image generators like SD-Turbo and SDXL-Turbo, which are difficult to align with standard RLHF methods that require policy likelihoods or differentiable reward gradients. The method samples candidates per prompt, ranks them with a target reward, and synthesizes a feature-space update direction via a non-parametric dipole preference field plus a reference drift from the frozen base model. Because the reward is used only for ranking, DrPO supports black-box and non-differentiable reward functions while keeping inference as a single forward pass. Evaluations on HPSv3 and GenEval show improved alignment over reward-gradient-free baselines and a 3.51× reduction in training compute by eliminating reward-model backpropagation.

Inference Economics Alignment and RLHF SDXL Turbo HPSv3 GenEval +4 more