Entity · technique

ZeRO

techniqueactivezero-c2f1fe82·3 events·first seen May 19, 2026

Aliases: ZeRO

Co-occurring entities

Microsoft DeepSpeed Hugging Face DeepSeek V4 Piper DualPipe Meta AI FairScale Hugging Face Transformers Hugging Face Accelerate

More like this (12)

DeepSpeed ZeRO Zed Zero Zeno LeRobot TerraZero BLOOMZ ZeroGPU AlphaStar Z3 Cubzh AlphaEvolve Rocket Money

Recent events (3)

6arXiv · cs.AI·Jun 10, 2026·source ↗

Piper: Programmable distributed training system decoupling parallelism strategy from runtime

Researchers present Piper, a distributed training system that separates parallelism strategy specification from low-level runtime execution via an intermediate representation (IR) — a unified global training DAG. Users declare strategies through model annotations and scheduling directives, which Piper compiles into per-device execution plans. The system matches performance on standard strategies like ZeRO while enabling additional gains through joint compute-communication scheduling in composed strategies such as DeepSeek-V3's DualPipe.

Training Infrastructure Frontier Model Releases DeepSeek V4 Piper DualPipe +1 more

4Hugging Face Blog·May 19, 2026·source ↗

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

This Hugging Face blog post from January 2021 covers integration of ZeRO (Zero Redundancy Optimizer) memory optimization techniques via DeepSpeed and FairScale into the Transformers training ecosystem. ZeRO partitions optimizer states, gradients, and model parameters across GPUs to enable training of much larger models on the same hardware. The post serves as a practical guide for practitioners looking to scale model training without additional infrastructure investment.

Training Infrastructure Inference Economics Meta AI Microsoft DeepSpeed +4 more

4Hugging Face Blog·May 19, 2026·source ↗

Accelerate Large Model Training using DeepSpeed

This Hugging Face blog post explains how to use the Accelerate library in conjunction with DeepSpeed to train large language models more efficiently. It covers integration patterns, configuration options, and practical guidance for leveraging DeepSpeed's ZeRO optimization stages through the Accelerate abstraction layer. The post targets practitioners looking to scale model training without deep infrastructure expertise.

Training Infrastructure Agent and Tool Ecosystem Microsoft DeepSpeed Hugging Face +2 more