technique
Ulysses Sequence Parallelism
techniqueactive
ulysses-sequence-parallelism-ec26a5a2·1 events·first seen 1mo agoAliases: Ulysses Sequence Parallelism
Co-occurring entities
More like this (12)
Recent events (1)
Ulysses Sequence Parallelism: Training with Million-Token Contexts
Hugging Face published a blog post on Ulysses sequence parallelism, a technique for distributing long-context training across multiple devices by partitioning the sequence dimension. The post covers how Ulysses enables training with million-token context windows by reducing per-device memory requirements. This is relevant to the ongoing challenge of scaling transformer training to very long sequences efficiently.