Almanac
technique

Runtime-Readiness-First Pipeline (RRFP)

techniqueactiveruntime-readiness-first-pipeline-rrfp--55f9db05·1 events·first seen 28d ago

Aliases: Runtime-Readiness-First Pipeline (RRFP)

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.LG·28d ago·source ↗

RRFP: A Readiness-Driven Runtime for Pipeline-Parallel Training Under Runtime Variability

The paper introduces Runtime-Readiness-First Pipeline (RRFP), a new runtime for pipeline-parallel large-model training that treats schedules as non-binding hint orders rather than strict execution sequences. By combining message-driven asynchronous communication, lightweight tensor-parallel coordination, and ready-set arbitration, RRFP dynamically dispatches work based on actual task readiness, reducing idle bubbles and stage misalignment. Implemented on a Megatron-based framework and evaluated at up to 128 GPUs, RRFP achieves up to 1.77× speedup on language-only workloads and 2.77× on multimodal workloads versus fixed-order baselines, and outperforms the fastest comparable external system by up to 1.84×.