Posted in 2026
How to Ensure Kernels Actually Overlap
- 15 February 2026
While the CPU scheduler controls the kernel launch order to favor overlapping, the GPU’s Hyper-Q driver [Bradley, 2013] ultimately dictates the actual execution order. This process is inherently non-deterministic and heavily influenced by transient GPU resource occupancy.
Distributed-Native FFA (Coming Soon)
- 14 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Attention Engine for Inference (Coming Soon)
- 08 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Blackwell with FFA_FA4 Backend
- 07 February 2026
Before the release of MagiAttention-v1.1.0, MagiAttention had supported only the Hopper GPUs, since the attention kernel backend Flex-Flash-Attention (FFA) is built upon open-sourced Flash-Attention 3 (FA3) [Shah et al., 2024], tailored for SM90 compute capability.
Support Muon QK-Clip
- 04 February 2026
The Muon optimizer [Jordan et al., 2024], which leverages matrix orthogonalization, has shown faster convergence than traditional optimizers such as Adam [Kingma and Ba, 2017, Loshchilov and Hutter, 2019] on smaller language models and was subsequently demonstrated to scale to large models by Kimi [Liu et al., 2025].
Optimize Sparse Attention in FFA (Coming Soon)
- 25 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Native Group Collective
- 24 January 2026
With the release of MagiAttention-v1.1.0, we are excited to announce the support for native group collective CUDA kernels for both intranode and internode communication, based upon the amazing work of DeepEP [Zhao et al., 2025].
Dynamic Attention Solver (Coming Soon)
- 21 January 2026
The upcoming blog post will be released in the near future. Stay tuned!