Posts by Jin Li

Support Muon QK-Clip

04 February 2026

The Muon optimizer [Jordan et al., 2024], which leverages matrix orthogonalization, has shown faster convergence than traditional optimizers such as Adam [Kingma and Ba, 2017, Loshchilov and Hutter, 2019] on smaller language models and was subsequently demonstrated to scale to large models by Kimi [Liu et al., 2025].

Read more ...

Optimize Sparse Attention in FFA (Coming Soon)

25 January 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Dynamic Attention Solver (Coming Soon)

21 January 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

MagiAttention

21 April 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training

Read more ...