4Hugging Face Blog·1mo ago

VQ-Diffusion: Vector Quantized Diffusion Models on Hugging Face

This Hugging Face blog post introduces VQ-Diffusion, a text-to-image generation approach that combines vector quantization with diffusion models. The method operates in a discrete latent space defined by a VQ-VAE codebook, applying the diffusion process to token sequences rather than continuous pixel or latent representations. The post likely covers integration into the Hugging Face diffusers ecosystem and demonstrates generation capabilities.

Agent and Tool Ecosystem Multimodal Progress VQ-VAE Hugging Face VQ-Diffusion Diffusers

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Exploring Quantization Backends in Diffusers

Hugging Face published a technical overview of quantization backends available in the Diffusers library for image and video generation models. The post covers integration with multiple quantization frameworks (likely bitsandbytes, GGUF, torchao, and similar) and their trade-offs for diffusion model inference. It targets practitioners seeking to reduce memory footprint and improve throughput when deploying diffusion models.

Inference Economics Agent and Tool Ecosystem torchao GGUF Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

Memory-efficient Diffusion Transformers with Quanto and Diffusers

This Hugging Face blog post describes integrating the Quanto quantization library with the Diffusers framework to reduce memory requirements for diffusion transformer models. The approach enables running large image/video generation models on consumer-grade hardware by applying int8 and int4 quantization to model weights. The post covers practical implementation details and benchmarks showing memory savings for models like Flux and others in the diffusion transformer family.

Inference Economics Agent and Tool Ecosystem Quanto Linear Diffusion Transformer Hugging Face +3 more

4Hugging Face Blog·1mo ago·source ↗

The Annotated Diffusion Model

A Hugging Face blog post providing a detailed, annotated walkthrough of diffusion models for image generation, likely covering the mathematical foundations and implementation details of denoising diffusion probabilistic models (DDPMs). The post serves as an educational deep-dive into the architecture and training process of diffusion-based generative models. Published in mid-2022, it coincides with the period of rapid growth in diffusion model adoption.

Multimodal Progress DDPM Denoising Diffusion Probabilistic Models Hugging Face

7Hugging Face Blog·1mo ago·source ↗

Stable Diffusion with 🧨 Diffusers

Hugging Face published a blog post introducing Stable Diffusion integration with their Diffusers library, covering the model's architecture and how to run it using the open-source tooling. The post appeared at the time of Stable Diffusion's public release in August 2022, marking a significant moment in accessible text-to-image generation. It served as both a technical introduction and a practical guide for the community to adopt the model.

Open Weights Progress Agent and Tool Ecosystem Stable Diffusion 3 Hugging Face Stability AI +2 more

5Hugging Face Blog·1mo ago·source ↗

Introducing Würstchen: Fast Diffusion for Image Generation

Hugging Face introduces Würstchen, a latent diffusion architecture designed for fast and efficient image generation. The model operates in a highly compressed latent space, reducing computational requirements significantly compared to standard diffusion models. It is being integrated into the Diffusers library, making it accessible for the broader community.

Open Weights Progress Inference Economics Hugging Face Würstchen latent diffusion +2 more

6arXiv · cs.AI·25d ago·source ↗

Channel-wise Vector Quantization (CVQ): A New Image Tokenization Paradigm with Next-Channel Prediction

Researchers introduce Channel-wise Vector Quantization (CVQ), which replaces conventional patch-wise discrete tokens with channel-wise tokens that represent an image as discrete levels of visual detail. Built on CVQ, the Channel-wise Autoregressive (CAR) model uses a 'next-channel prediction' objective, generating images by progressively refining from global structure to fine-grained attributes. CVQ achieves 100% codebook utilization with a 16K+ codebook and the CAR model scores 86.7 on DPG and 0.79 on GenEval for text-to-image generation. The approach offers a structural alternative to raster-order patch-based autoregressive image generation.

Frontier Model Releases Evaluation and Benchmarking Channel-wise Vector Quantization DPG Benchmark GenEval +4 more

5Hugging Face Blog·1mo ago·source ↗

State of open video generation models in Diffusers

Hugging Face published a survey of open-source video generation models integrated into the Diffusers library as of January 2025. The post covers the current landscape of available open video generation models, their capabilities, and how they are supported within the Diffusers ecosystem. This serves as a reference for practitioners looking to use or compare open-weights video generation models.

Open Weights Progress Agent and Tool Ecosystem Hugging Face video generation Diffusers +1 more

4Hugging Face Blog·1mo ago·source ↗

Training Stable Diffusion with Dreambooth using Diffusers

This Hugging Face blog post describes how to fine-tune Stable Diffusion models using the DreamBooth technique via the Diffusers library. DreamBooth enables personalized text-to-image generation by training a model on a small set of reference images. The post covers the technical workflow for applying this fine-tuning approach within the Diffusers ecosystem.

Open Weights Progress Agent and Tool Ecosystem Hugging Face Diffusers Stable Diffusion 3 Hugging Face +1 more