MagiAttention documentation#

Overview :

MagiAttention is a distributed attention (Context Parallelism) solution tailored for the demanding requirements of ultra-long sequences and heterogeneous masking patterns. It combines Flex-Flash-Attention (FFA)—a kernel supporting distributable and flexible mask representations—with a dispatch solver for load-balanced computation and new Group Collective primitives for communication to achieve zero-redundant communication. By coordinating these components through an adaptive multi-stage overlap strategy, MagiAttention delivers linear scalability across a broad range of training scenarios, such as large-scale video generation in Magi-1.

We are committed to continually improving the performance and generality of MagiAttention for the broader research community. Stay tuned for exciting enhancements and new features on the horizon!

Contents

User Guide
Blogs