Optimize Sparse Attention in FFA#
Todo
The upcoming blog post will be released in the near future. Stay tuned!
Citation#
If you find MagiAttention useful in your research, please cite:
@misc{magiattention2025,
title={MagiAttention: A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training},
author={Zewei, Tao and Yunpeng, Huang},
year={2025},
howpublished={\url{https://github.com/SandAI-org/MagiAttention/}},
}
References#
Support Native Group Collective Based on DeepEP
Support Muon QK-Clip