Posts tagged Distributed Attention

How to Ensure Kernels Actually Overlapped

15 February 2026

While the CPU scheduler controls kernel launch order to favor overlap, the GPU Hyper-Q driver [Bradley, 2013] ultimately determines actual execution order non‑deterministically, influenced by transient GPU resource occupancy as well.

Read more ...

Distributed-Native FFA

14 February 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Attention Engine for Inference

08 February 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Support Native Group Collective Based on DeepEP

24 January 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Dynamic Attention Solver

21 January 2026

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...

Long-Context Attention Benchmark

19 October 2025

From Kernel Efficiency to Distributed Scalability

Read more ...

MagiAttention

21 April 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training

Read more ...

Recent Posts

Tags

Categories

Archives

Authors

Languages

Locations

How to Ensure Kernels Actually Overlapped

Distributed-Native FFA

Attention Engine for Inference

Support Native Group Collective Based on DeepEP

Dynamic Attention Solver

Long-Context Attention Benchmark

MagiAttention