Open Preference Dataset for Text-to-Image Generation by the Hugging Face Community
Hugging Face has released an open preference dataset for text-to-image generation, collected through community participation. The dataset captures human preference signals across image generation outputs, intended to support alignment and reward modeling research for image generation models. This contributes to the growing ecosystem of open datasets for training and evaluating generative image models.
Related guides (3)
Related events (8)
Build Awesome Datasets for Video Generation
Hugging Face published a blog post on constructing high-quality datasets for video generation models. The post likely covers data collection, preprocessing, and curation pipelines relevant to training video diffusion or generation systems. This is a practical tooling and methodology guide aimed at practitioners working on video AI.
Introducing the Synthetic Data Generator - Build Datasets with Natural Language
Hugging Face has launched a Synthetic Data Generator tool that allows users to create datasets using natural language descriptions. The tool is designed to lower the barrier for dataset creation, enabling practitioners to generate training data without writing code. This is relevant to the broader trend of synthetic data as a scalable alternative to manual data collection and annotation.
Introducing TextImage Augmentation for Document Images
Hugging Face introduces a TextImage augmentation library for document images, aimed at improving model robustness for document understanding tasks. The tooling applies transformations such as noise, blur, and distortion to document images to simulate real-world scanning and printing artifacts. This is relevant to training and fine-tuning vision-language models on document datasets.
GPIC: Stanford Releases 28-Trillion-Pixel Permissively Licensed Image Corpus for Visual Generation Research
Stanford Vision Lab introduces GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels comprising 100M training, 200K validation, and 1M test images, all permissively licensed for research and commercial use. Images are captioned by a state-of-the-art vision-language model, safety-filtered, deduplicated, and hosted on Hugging Face. The release includes a benchmarking protocol for generative modeling and a reference baseline using pixel-space flow matching. The dataset addresses a key gap in scalable visual generative modeling research by providing a large, stable, and openly licensed resource.
Preference Optimization for Vision Language Models
This Hugging Face blog post covers the application of Direct Preference Optimization (DPO) to vision-language models (VLMs). It likely discusses how preference learning techniques originally developed for text-only LLMs can be adapted to multimodal settings. The post addresses training methodology for aligning VLMs with human preferences across both visual and textual modalities.
LeRobot Community Datasets: The "ImageNet" of Robotics — When and How?
Hugging Face's LeRobot blog post discusses the vision and current state of building a large-scale community robotics dataset analogous to ImageNet for computer vision. The post examines what it would take to create a standardized, scalable dataset repository for robot learning, drawing on the LeRobot ecosystem. It addresses data collection formats, community contribution workflows, and the open challenges in making such a resource practically useful for training generalizable robot policies.
State of open video generation models in Diffusers
Hugging Face published a survey of open-source video generation models integrated into the Diffusers library as of January 2025. The post covers the current landscape of available open video generation models, their capabilities, and how they are supported within the Diffusers ecosystem. This serves as a reference for practitioners looking to use or compare open-weights video generation models.
Welcome aMUSEd: Efficient Text-to-Image Generation
Hugging Face introduces aMUSEd, a text-to-image model based on the MUSE architecture that prioritizes efficiency over raw quality. The model is designed to be smaller and faster than diffusion-based alternatives, making it more accessible for deployment. It is released with integration into the Diffusers library.


