Almanac
← Events
5Hugging Face Blog·1mo ago

Finetune Stable Diffusion Models with DDPO via TRL

Hugging Face's TRL library adds support for DDPO (Denoising Diffusion Policy Optimization), enabling reinforcement learning-based finetuning of Stable Diffusion models. This extends TRL's RLHF tooling beyond language models to image generation, allowing reward-driven optimization of diffusion models. The post demonstrates practical usage of the new DDPO trainer within the TRL ecosystem.

Related guides (4)

Related events (8)

5arXiv · cs.LG·11d ago·source ↗

DRPO: Smooth divergence regularization replaces hard masking in LLM RL training

A new arXiv preprint proposes Divergence Regularized Policy Optimization (DRPO), a method that replaces the hard trust-region mask used in DPPO with a smooth advantage-weighted quadratic regularizer on policy shift. The approach addresses a known weakness in PPO and GRPO where importance ratios poorly proxy distributional shift in long-tailed vocabularies, and in DPPO where gradient signals are discarded rather than corrected at trust-region boundaries. Experiments across model scales, architectures, and precision settings show improved stability and efficiency in LLM RL post-training.

5Hugging Face Blog·1mo ago·source ↗

Using LoRA for Efficient Stable Diffusion Fine-Tuning

This Hugging Face blog post explains how Low-Rank Adaptation (LoRA) can be applied to fine-tune Stable Diffusion models efficiently. LoRA reduces the number of trainable parameters by decomposing weight updates into low-rank matrices, enabling fine-tuning on consumer hardware with significantly less memory. The post covers practical implementation details using the diffusers library.

5Hugging Face Blog·1mo ago·source ↗

Preference Tuning LLMs with Direct Preference Optimization Methods

A Hugging Face blog post surveys Direct Preference Optimization (DPO) and related preference tuning methods for aligning large language models. The post covers the landscape of DPO variants and their practical application via the TRL library. It serves as a technical reference for practitioners implementing RLHF alternatives.

5Hugging Face Blog·1mo ago·source ↗

Fine-tune Llama 2 with DPO

This Hugging Face blog post provides a practical guide to fine-tuning Llama 2 using Direct Preference Optimization (DPO) via the TRL library. It covers the alignment technique that bypasses the need for a separate reward model compared to RLHF, walking through dataset preparation, training configuration, and implementation details. The post targets practitioners looking to apply preference-based alignment to open-weights models.

4Hugging Face Blog·1mo ago·source ↗

The Annotated Diffusion Model

A Hugging Face blog post providing a detailed, annotated walkthrough of diffusion models for image generation, likely covering the mathematical foundations and implementation details of denoising diffusion probabilistic models (DDPMs). The post serves as an educational deep-dive into the architecture and training process of diffusion-based generative models. Published in mid-2022, it coincides with the period of rapid growth in diffusion model adoption.

6arXiv · cs.LG·18d ago·source ↗

Drifting Preference Optimization (DrPO) for One-Step Text-to-Image Generators

DrPO is a new online preference fine-tuning method designed specifically for deterministic one-step text-to-image generators like SD-Turbo and SDXL-Turbo, which are difficult to align with standard RLHF methods that require policy likelihoods or differentiable reward gradients. The method samples candidates per prompt, ranks them with a target reward, and synthesizes a feature-space update direction via a non-parametric dipole preference field plus a reference drift from the frozen base model. Because the reward is used only for ranking, DrPO supports black-box and non-differentiable reward functions while keeping inference as a single forward pass. Evaluations on HPSv3 and GenEval show improved alignment over reward-gradient-free baselines and a 3.51× reduction in training compute by eliminating reward-model backpropagation.

4Hugging Face Blog·1mo ago·source ↗

Training Stable Diffusion with Dreambooth using Diffusers

This Hugging Face blog post describes how to fine-tune Stable Diffusion models using the DreamBooth technique via the Diffusers library. DreamBooth enables personalized text-to-image generation by training a model on a small set of reference images. The post covers the technical workflow for applying this fine-tuning approach within the Diffusers ecosystem.

6Hugging Face Blog·1mo ago·source ↗

TRL v1.0: Post-Training Library Built to Move with the Field

Hugging Face has released TRL v1.0, a major milestone for its post-training library focused on reinforcement learning from human feedback and related alignment techniques. The release signals a stabilization of the API and feature set after iterative development tracking the rapidly evolving post-training landscape. TRL is widely used in the open-source community for fine-tuning and aligning language models using methods such as PPO, DPO, and GRPO.