Almanac
product

Megatron-LM

productactivemegatron-lm-eb59f34c·3 events·first seen 28d ago

Aliases: Megatron-LM

Co-occurring entities

More like this (12)

Recent events (3)

6arXiv · cs.LG·28d ago·source ↗

RRFP: A Readiness-Driven Runtime for Pipeline-Parallel Training Under Runtime Variability

The paper introduces Runtime-Readiness-First Pipeline (RRFP), a new runtime for pipeline-parallel large-model training that treats schedules as non-binding hint orders rather than strict execution sequences. By combining message-driven asynchronous communication, lightweight tensor-parallel coordination, and ready-set arbitration, RRFP dynamically dispatches work based on actual task readiness, reducing idle bubbles and stage misalignment. Implemented on a Megatron-based framework and evaluated at up to 128 GPUs, RRFP achieves up to 1.77× speedup on language-only workloads and 2.77× on multimodal workloads versus fixed-order baselines, and outperforms the fastest comparable external system by up to 1.84×.

6Hugging Face Blog·28d ago·source ↗

The Technology Behind BLOOM Training

This Hugging Face blog post details the infrastructure and training methodology used to train BLOOM, a 176-billion parameter open-access multilingual language model. It covers the use of Megatron-DeepSpeed for distributed training across hundreds of GPUs, including tensor parallelism, pipeline parallelism, and data parallelism strategies. The post also discusses hardware setup, memory optimization techniques, and lessons learned during the large-scale training run.

3Github Trending·20d ago·source ↗

NVIDIA NeMo Megatron-Bridge: Bidirectional Hugging Face Conversion for Megatron-Based Training

Megatron-Bridge is an NVIDIA NeMo training library for Megatron-based models that supports bidirectional conversion between Megatron and Hugging Face formats. The repository has accumulated 670 stars with modest daily growth (+5). It addresses a practical interoperability gap between the high-performance Megatron training stack and the broader HuggingFace ecosystem.