Runtime-Readiness-First Pipeline (RRFP)
runtime-readiness-first-pipeline-rrfp--55f9db05·1 events·first seen 28d agoAliases: Runtime-Readiness-First Pipeline (RRFP)
Co-occurring entities
More like this (12)
Recent events (1)
RRFP: A Readiness-Driven Runtime for Pipeline-Parallel Training Under Runtime Variability
The paper introduces Runtime-Readiness-First Pipeline (RRFP), a new runtime for pipeline-parallel large-model training that treats schedules as non-binding hint orders rather than strict execution sequences. By combining message-driven asynchronous communication, lightweight tensor-parallel coordination, and ready-set arbitration, RRFP dynamically dispatches work based on actual task readiness, reducing idle bubbles and stage misalignment. Implemented on a Megatron-based framework and evaluated at up to 128 GPUs, RRFP achieves up to 1.77× speedup on language-only workloads and 2.77× on multimodal workloads versus fixed-order baselines, and outperforms the fastest comparable external system by up to 1.84×.