Posts tagged Multi-Stage Overlap
MagiAttention
- 21 April 2025
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training