Almanac
technique

Ulysses Sequence Parallelism

techniqueactiveulysses-sequence-parallelism-ec26a5a2·1 events·first seen 1mo ago

Aliases: Ulysses Sequence Parallelism

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·1mo ago·source ↗

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Hugging Face published a blog post on Ulysses sequence parallelism, a technique for distributing long-context training across multiple devices by partitioning the sequence dimension. The post covers how Ulysses enables training with million-token context windows by reducing per-device memory requirements. This is relevant to the ongoing challenge of scaling transformer training to very long sequences efficiently.