6Hugging Face Blog·1mo ago

The Technology Behind BLOOM Training

This Hugging Face blog post details the infrastructure and training methodology used to train BLOOM, a 176-billion parameter open-access multilingual language model. It covers the use of Megatron-DeepSpeed for distributed training across hundreds of GPUs, including tensor parallelism, pipeline parallelism, and data parallelism strategies. The post also discusses hardware setup, memory optimization techniques, and lessons learned during the large-scale training run.

Training Infrastructure Open Weights Progress BLOOM DeepSpeed Hugging Face BigScience Megatron-LM

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Related events (8)

8Hugging Face Blog·1mo ago·source ↗

Introducing BLOOM: The World's Largest Open Multilingual Language Model

Hugging Face and the BigScience workshop released BLOOM, a 176-billion parameter open-access multilingual language model trained on 46 natural languages and 13 programming languages. The model was developed collaboratively by over 1,000 researchers and represents a significant milestone in open-weights large language model development. BLOOM was designed to be freely accessible to researchers and practitioners, in contrast to proprietary models of similar scale.

Frontier Model Releases Open Weights Progress BLOOM Hugging Face BigScience +1 more

4Hugging Face Blog·1mo ago·source ↗

Optimization story: Bloom inference

This Hugging Face blog post documents practical inference optimization techniques applied to the BLOOM large language model. It covers strategies for reducing latency and memory footprint during deployment, likely including quantization, tensor parallelism, and batching approaches. The post serves as a technical case study for serving very large open-weights models efficiently.

Open Weights Progress Inference Economics BLOOM Hugging Face

5Hugging Face Blog·1mo ago·source ↗

Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate

This Hugging Face blog post details inference optimization techniques for the BLOOM 176B parameter model using DeepSpeed ZeRO and Hugging Face Accelerate. The post provides PyTorch scripts and benchmarks demonstrating significant throughput improvements through tensor parallelism and other optimizations. It serves as a practical guide for deploying large open-weight models efficiently across multiple GPUs.

Training Infrastructure Open Weights Progress BLOOM DeepSpeed Hugging Face +3 more

4Hugging Face Blog·1mo ago·source ↗

Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator

This Hugging Face blog post covers deploying BLOOMZ, a large multilingual language model, on Intel's Habana Gaudi2 accelerator for inference. It benchmarks throughput and latency performance on Gaudi2 as an alternative to GPU-based inference. The post is part of ongoing work to demonstrate non-NVIDIA hardware options for large model deployment.

Training Infrastructure Open Weights Progress BLOOMZ Habana Gaudi BLOOM +3 more

4Hugging Face Blog·1mo ago·source ↗

Deep Learning over the Internet: Training Language Models Collaboratively

This Hugging Face blog post describes a framework for training large language models collaboratively across volunteer compute contributed over the internet. The approach addresses the challenge of enabling distributed participants with heterogeneous hardware to jointly train models without centralized infrastructure. It represents an early exploration of decentralized training as an alternative to large-scale private compute clusters.

Training Infrastructure Open Weights Progress collaborative distributed training Hugging Face volunteer compute

4Hugging Face Blog·1mo ago·source ↗

Federated Learning using Hugging Face and Flower

This Hugging Face blog post describes how to combine the Hugging Face ecosystem with the Flower federated learning framework to train models across distributed, privacy-preserving data silos. It provides a practical walkthrough of integrating Transformers and Datasets libraries with Flower's federated training loop. The post targets practitioners looking to apply federated learning to NLP and other ML tasks without centralizing sensitive data.

Training Infrastructure Enterprise Deployment Patterns Federated Learning Hugging Face Datasets Hugging Face Transformers +2 more

5Hugging Face Blog·1mo ago·source ↗

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

This Hugging Face blog post provides a technical guide for fine-tuning Microsoft's Florence-2 vision-language models. Florence-2 is a compact yet capable multimodal model supporting tasks like captioning, object detection, and OCR. The post covers practical implementation details for adapting the model to custom datasets using the Hugging Face ecosystem.

Enterprise Deployment Patterns Agent and Tool Ecosystem Microsoft Hugging Face Florence-2 +1 more

3Hugging Face Blog·1mo ago·source ↗

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

This Hugging Face blog post from August 2022 describes how to pre-train a BERT model from scratch using the Hugging Face Transformers library on Habana Gaudi hardware accelerators. It covers the full pipeline including data preparation, tokenizer training, and masked language modeling pretraining. The post serves as both a technical tutorial and a demonstration of Habana Gaudi's viability as an alternative AI training accelerator.

Training Infrastructure Habana Gaudi Hugging Face Transformers Hugging Face +2 more