4Hugging Face Blog·1mo ago

PRX Part 3 — Training a Text-to-Image Model in 24 Hours

Photoroom shares the third installment of their PRX series on Hugging Face, detailing how they trained a text-to-image model within a 24-hour window. The post covers the practical engineering and training infrastructure decisions that enabled rapid model development. This is part of an ongoing series documenting Photoroom's internal model development process.

Training Infrastructure Multimodal Progress Hugging Face Photoroom PRX

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Training Design for Text-to-Image Models: Lessons from Ablations

Photoroom shares practical lessons from ablation studies on training design choices for text-to-image diffusion models. The post covers decisions around data curation, model architecture, and training hyperparameters derived from systematic experimentation. This is part two of a series documenting Photoroom's internal research into building production-grade image generation systems.

Training Infrastructure Multimodal Progress Hugging Face Photoroom PRX

5Hugging Face Blog·1mo ago·source ↗

Zero-shot image-to-text generation with BLIP-2

Hugging Face published a blog post introducing BLIP-2, a multimodal model that enables zero-shot image-to-text generation by bridging frozen image encoders and large language models via a lightweight Querying Transformer (Q-Former). The post covers the model's architecture, capabilities, and how to use it via the Hugging Face Transformers library. BLIP-2 achieves strong performance on visual question answering and image captioning tasks without task-specific fine-tuning.

Open Weights Progress Agent and Tool Ecosystem Q-Former Salesforce Research Hugging Face Transformers +3 more

4Hugging Face Blog·1mo ago·source ↗

A Dive into Text-to-Video Models

A Hugging Face blog post providing an overview of text-to-video generation models as of mid-2023. The post surveys the landscape of approaches, architectures, and key models in the emerging text-to-video space. As a tier-2 commentary piece, it synthesizes existing work rather than presenting novel research.

Multimodal Progress text-to-video generation Hugging Face

3Hugging Face Blog·1mo ago·source ↗

Introducing TextImage Augmentation for Document Images

Hugging Face introduces a TextImage augmentation library for document images, aimed at improving model robustness for document understanding tasks. The tooling applies transformations such as noise, blur, and distortion to document images to simulate real-world scanning and printing artifacts. This is relevant to training and fine-tuning vision-language models on document datasets.

Agent and Tool Ecosystem Hugging Face TextImage Augmentation

6Hugging Face Blog·1mo ago·source ↗

The Technology Behind BLOOM Training

This Hugging Face blog post details the infrastructure and training methodology used to train BLOOM, a 176-billion parameter open-access multilingual language model. It covers the use of Megatron-DeepSpeed for distributed training across hundreds of GPUs, including tensor parallelism, pipeline parallelism, and data parallelism strategies. The post also discusses hardware setup, memory optimization techniques, and lessons learned during the large-scale training run.

Training Infrastructure Open Weights Progress BLOOM DeepSpeed Hugging Face +2 more

3Hugging Face Blog·1mo ago·source ↗

Training a Language Model with Hugging Face Transformers Using TensorFlow and TPUs

This Hugging Face blog post provides a technical walkthrough for training a language model using TensorFlow and Google TPUs via the Transformers library. It covers the practical setup, data pipeline, and training configuration required to leverage TPU hardware with the TF ecosystem. The post serves as a tutorial bridging Hugging Face tooling with TPU-based infrastructure.

Training Infrastructure Agent and Tool Ecosystem Google TPU Hugging Face Transformers Hugging Face +1 more

3Hugging Face Blog·1mo ago·source ↗

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

This Hugging Face blog post from August 2022 describes how to pre-train a BERT model from scratch using the Hugging Face Transformers library on Habana Gaudi hardware accelerators. It covers the full pipeline including data preparation, tokenizer training, and masked language modeling pretraining. The post serves as both a technical tutorial and a demonstration of Habana Gaudi's viability as an alternative AI training accelerator.

Training Infrastructure Habana Gaudi Hugging Face Transformers Hugging Face +2 more

3Hugging Face Blog·1mo ago·source ↗

Training CodeParrot from Scratch

Hugging Face published a detailed walkthrough of training CodeParrot, a GPT-2-style language model trained from scratch on GitHub code data. The post covers dataset preparation, tokenizer training, model configuration, and distributed training setup using the Accelerate library. It serves as both a technical tutorial and a demonstration of open-source code generation model development practices circa late 2021.

Training Infrastructure Open Weights Progress GitHub Code Dataset CodeParrot GPT-2 +2 more