Posts by Zewei Tao
How to Ensure Kernels Actually Overlap
- 15 February 2026
While the CPU scheduler controls the kernel launch order to favor overlapping, the GPU’s Hyper-Q driver [Bradley, 2013] ultimately dictates the actual execution order. This process is inherently non-deterministic and heavily influenced by transient GPU resource occupancy.
Attention Engine for Inference (Coming Soon)
- 08 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Blackwell with FFA_FA4 Backend
- 07 February 2026
Before the release of MagiAttention-v1.1.0, MagiAttention had supported only the Hopper GPUs, since the attention kernel backend Flex-Flash-Attention (FFA) is built upon open-sourced Flash-Attention 3 (FA3) [Shah et al., 2024], tailored for SM90 compute capability.
Optimize Sparse Attention in FFA (Coming Soon)
- 25 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Native Group Collective
- 24 January 2026
With the release of MagiAttention-v1.1.0, we are excited to announce the support for native group collective CUDA kernels for both intranode and internode communication, based upon the amazing work of DeepEP [Zhao et al., 2025].
Dynamic Attention Solver (Coming Soon)
- 21 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
MagiAttention
- 21 April 2025
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training