Entity · product

DeepSpeed

productactivedeepspeed-5342eace·6 events·first seen May 18, 2026

Aliases: DeepSpeed

Co-occurring entities

Hugging Face Hugging Face Accelerate Microsoft ZeRO BLOOM Meta AI FairScale Hugging Face Transformers BigScience Megatron-LM DeepSpeed ZeRO PyTorch FSDP Ulysses Sequence Parallelism

More like this (12)

DeepSpeed ZeRO DeepMind DeepSWE DeepInfra DeepSeek-V3.2-Speciale DeepSeek-V4-Flash DeepSeek API DeepMath DeepSeek-V3.1-Base DeepSeek-V2.5-1210 DeepSeek V4 DeepSeek-V3-0324

Recent events (6)

4Hugging Face Blog·May 19, 2026·source ↗

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

This Hugging Face blog post from January 2021 covers integration of ZeRO (Zero Redundancy Optimizer) memory optimization techniques via DeepSpeed and FairScale into the Transformers training ecosystem. ZeRO partitions optimizer states, gradients, and model parameters across GPUs to enable training of much larger models on the same hardware. The post serves as a practical guide for practitioners looking to scale model training without additional infrastructure investment.

Training Infrastructure Inference Economics Meta AI Microsoft DeepSpeed +4 more

4Hugging Face Blog·May 19, 2026·source ↗

Accelerate Large Model Training using DeepSpeed

This Hugging Face blog post explains how to use the Accelerate library in conjunction with DeepSpeed to train large language models more efficiently. It covers integration patterns, configuration options, and practical guidance for leveraging DeepSpeed's ZeRO optimization stages through the Accelerate abstraction layer. The post targets practitioners looking to scale model training without deep infrastructure expertise.

Training Infrastructure Agent and Tool Ecosystem Microsoft DeepSpeed Hugging Face +2 more

6Hugging Face Blog·May 19, 2026·source ↗

The Technology Behind BLOOM Training

This Hugging Face blog post details the infrastructure and training methodology used to train BLOOM, a 176-billion parameter open-access multilingual language model. It covers the use of Megatron-DeepSpeed for distributed training across hundreds of GPUs, including tensor parallelism, pipeline parallelism, and data parallelism strategies. The post also discusses hardware setup, memory optimization techniques, and lessons learned during the large-scale training run.

Training Infrastructure Open Weights Progress BLOOM DeepSpeed Hugging Face +2 more

5Hugging Face Blog·May 19, 2026·source ↗

Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate

This Hugging Face blog post details inference optimization techniques for the BLOOM 176B parameter model using DeepSpeed ZeRO and Hugging Face Accelerate. The post provides PyTorch scripts and benchmarks demonstrating significant throughput improvements through tensor parallelism and other optimizations. It serves as a practical guide for deploying large open-weight models efficiently across multiple GPUs.

Training Infrastructure Open Weights Progress BLOOM DeepSpeed Hugging Face +3 more

4Hugging Face Blog·May 19, 2026·source ↗

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

This Hugging Face blog post covers the practical migration path between DeepSpeed and PyTorch FSDP distributed training backends using the Accelerate library. It addresses configuration differences, compatibility considerations, and workflow patterns for switching between the two frameworks. The post targets practitioners running large-scale model training who need flexibility across distributed training strategies.

Training Infrastructure PyTorch FSDP DeepSpeed Hugging Face +1 more

5Hugging Face Blog·May 18, 2026·source ↗

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Hugging Face published a blog post on Ulysses sequence parallelism, a technique for distributing long-context training across multiple devices by partitioning the sequence dimension. The post covers how Ulysses enables training with million-token context windows by reducing per-device memory requirements. This is relevant to the ongoing challenge of scaling transformer training to very long sequences efficiently.

Training Infrastructure Long Context Evolution Ulysses Sequence Parallelism DeepSpeed Hugging Face