Posts tagged Distributed Attention
How to Ensure Kernels Actually Overlapped
- 15 February 2026
While the CPU scheduler controls kernel launch order to favor overlap, the GPU Hyper-Q driver [Bradley, 2013] ultimately determines actual execution order non‑deterministically, influenced by transient GPU resource occupancy as well.
Distributed-Native FFA
- 14 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Attention Engine for Inference
- 08 February 2026
The upcoming blog post will be released in the near future. Stay tuned!
Support Native Group Collective Based on DeepEP
- 24 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
Dynamic Attention Solver
- 21 January 2026
The upcoming blog post will be released in the near future. Stay tuned!
MagiAttention
- 21 April 2025
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training