Posted in 2026

How to Ensure Kernels Actually Overlap

15 February 2026

While the CPU scheduler controls the kernel launch order to favor overlapping, the GPU’s Hyper-Q driver [Bradley, 2013] ultimately dictates the actual execution order. This process is inherently non-deterministic and heavily influenced by transient GPU resource occupancy.

Read more ...

Distributed-Native FFA (Coming Soon)

14 February 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Attention Engine for Inference (Coming Soon)

08 February 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Support Blackwell with FFA_FA4 Backend

07 February 2026

Before the release of MagiAttention-v1.1.0, MagiAttention had supported only the Hopper GPUs, since the attention kernel backend Flex-Flash-Attention (FFA) is built upon open-sourced Flash-Attention 3 (FA3) [Shah et al., 2024], tailored for SM90 compute capability.

Read more ...

Support Muon QK-Clip

04 February 2026

The Muon optimizer [Jordan et al., 2024], which leverages matrix orthogonalization, has shown faster convergence than traditional optimizers such as Adam [Kingma and Ba, 2017, Loshchilov and Hutter, 2019] on smaller language models and was subsequently demonstrated to scale to large models by Kimi [Liu et al., 2025].

Read more ...

Optimize Sparse Attention in FFA (Coming Soon)

25 January 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Support Native Group Collective

24 January 2026

With the release of MagiAttention-v1.1.0, we are excited to announce the support for native group collective CUDA kernels for both intranode and internode communication, based upon the amazing work of DeepEP [Zhao et al., 2025].

Recent Posts

Tags

Categories

Archives

Authors

Locations

How to Ensure Kernels Actually Overlap

Distributed-Native FFA (Coming Soon)

Attention Engine for Inference (Coming Soon)

Support Blackwell with FFA_FA4 Backend

Support Muon QK-Clip

Optimize Sparse Attention in FFA (Coming Soon)

Support Native Group Collective

Dynamic Attention Solver (Coming Soon)