technique

Flow-GRPO

techniqueactiveprovisionalflow-grpo-29bc2d68·1 events·first seen 15h ago

Aliases: Flow-GRPO

Co-occurring entities

HPSv3 HPDv2 HPDv3 Stable Diffusion 3.5 Large DiT-Reward

More like this (12)

N-GRPO GRPO AdvGRPO GRPO (Group Relative Policy Optimization)Flow IH-GRPO RAGFlow GraphPO DRFLOW Latent-Anchored GRPO Cognizant Flowsource Self-Flow

Recent events (1)

6arXiv · cs.AI·15h ago·source ↗

DiT-Reward converts text-to-image diffusion transformers into reward models, outperforming HPSv3

DiT-Reward is a new reward modeling approach that repurposes pretrained text-to-image Diffusion Transformers (DiTs) by processing near-clean image latents and aggregating text-conditioned representations across transformer layers. Under matched training data, it outperforms HPSv3 on all four evaluated preference benchmarks, reaching 85.6% on HPDv2 and 77.6% on HPDv3. When used to optimize Stable Diffusion 3.5 Large via Flow-GRPO, it shows clear gains in realism and achieves a 1.65x inference speedup over HPSv3. The work demonstrates that generative DiT representations transfer meaningfully to reward modeling and policy optimization.

Evaluation and Benchmarking Alignment and RLHF Flow-GRPO HPSv3 HPDv2 +4 more