Almanac
technique

ZeRO

techniqueactivezero-c2f1fe82·3 events·first seen 28d ago

Aliases: ZeRO

Co-occurring entities

More like this (12)

Recent events (3)

4Hugging Face Blog·28d ago·source ↗

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

This Hugging Face blog post from January 2021 covers integration of ZeRO (Zero Redundancy Optimizer) memory optimization techniques via DeepSpeed and FairScale into the Transformers training ecosystem. ZeRO partitions optimizer states, gradients, and model parameters across GPUs to enable training of much larger models on the same hardware. The post serves as a practical guide for practitioners looking to scale model training without additional infrastructure investment.

4Hugging Face Blog·28d ago·source ↗

Accelerate Large Model Training using DeepSpeed

This Hugging Face blog post explains how to use the Accelerate library in conjunction with DeepSpeed to train large language models more efficiently. It covers integration patterns, configuration options, and practical guidance for leveraging DeepSpeed's ZeRO optimization stages through the Accelerate abstraction layer. The post targets practitioners looking to scale model training without deep infrastructure expertise.

6arXiv · cs.AI·7d ago·source ↗

Piper: Programmable distributed training system decoupling parallelism strategy from runtime

Researchers present Piper, a distributed training system that separates parallelism strategy specification from low-level runtime execution via an intermediate representation (IR) — a unified global training DAG. Users declare strategies through model annotations and scheduling directives, which Piper compiles into per-device execution plans. The system matches performance on standard strategies like ZeRO while enabling additional gains through joint compute-communication scheduling in composed strategies such as DeepSeek-V3's DualPipe.