Posted in 2026
How to Ensure Kernels Actually Overlapped
- 15 February 2026
While the CPU scheduler controls kernel launch order to favor overlap, the GPU Hyper-Q driver [Bradley, 2013] ultimately determines actual execution order non‑deterministically, influenced by transient GPU resource occupancy as well.
Distributed-Native FFA
- 14 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Attention Engine for Inference
- 08 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Blackwell with FFA_FA4 Backend
- 07 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Muon QK-Clip
- 04 February 2026
The Muon optimizer [Jordan et al., 2024], which leverages matrix orthogonalization, has shown faster convergence than traditional optimizers such as Adam [Kingma and Ba, 2017, Loshchilov and Hutter, 2019] on smaller language models and was subsequently demonstrated to scale to large models by Kimi [Liu et al., 2025].
Optimize Sparse Attention in FFA
- 25 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Native Group Collective Based on DeepEP
- 24 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
Dynamic Attention Solver
- 21 January 2026
The upcoming blog post will be released in the near future. Stay tuned!