Posts by Jin Li
Support Muon QK-Clip
- 04 February 2026
The Muon optimizer [Jordan et al., 2024], which leverages matrix orthogonalization, has shown faster convergence than traditional optimizers such as Adam [Kingma and Ba, 2017, Loshchilov and Hutter, 2019] on smaller language models and was subsequently demonstrated to scale to large models by Kimi [Liu et al., 2025].
Optimize Sparse Attention in FFA (Coming Soon)
- 25 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
Dynamic Attention Solver (Coming Soon)
- 21 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
MagiAttention
- 21 April 2025
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training