Almanac
technique

Denoising Diffusion Policy Optimization

techniqueactivedenoising-diffusion-policy-optimization-c401f75b·1 events·first seen 28d ago

Aliases: Denoising Diffusion Policy Optimization

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·28d ago·source ↗

Finetune Stable Diffusion Models with DDPO via TRL

Hugging Face's TRL library adds support for DDPO (Denoising Diffusion Policy Optimization), enabling reinforcement learning-based finetuning of Stable Diffusion models. This extends TRL's RLHF tooling beyond language models to image generation, allowing reward-driven optimization of diffusion models. The post demonstrates practical usage of the new DDPO trainer within the TRL ecosystem.