Skip to main content

Ctrl+K

MagiAttention

User Guide
Blogs

Github
Blog

User Guide
Blogs

Github
Blog

Recent Posts

15 February - How to Ensure Kernels Actually Overlapped
14 February - Distributed-Native FFA
08 February - Attention Engine for Inference
07 February - Support Blackwell with FFA_FA4 Backend
04 February - Support Muon QK-Clip

Tags

AF Disaggregation
Attention Sink
Attention Slice Representation
Benchmark
Blackwell
Computation Load-Balance
Computation-Communication Overlap
Context Parallelism
DSA
DeepEP
Distributed Attention
Dynamic Load Balance
Flash-Attention
Flex-Flash-Attention
Group Collective
HSTU Function Representation
Hybrid Attention
Multi-Stage Overlap
Muon
NSA
QK-Clip
Sparse Attention
Zero-Redundant Communication

Categories

MagiAttention (12)

Archives

2026 (8)
2025 (4)

Authors

Bowen Zeng (3)
Hanwen Sun (3)
Jerry Chen (1)
Jin Li (4)
Kunlun Li (1)
Qiangang Wang (4)
Tao Bu (2)
Yufeng Yang (1)
Yujia Liu (1)
Yunpeng Huang (11)
Zewei Tao (7)

Languages

English (12)

Locations

China (12)

Blogs

Blogs#

Blogs

MagiAttention
Long-Context Attention Benchmark
Support Native Group Collective Based on DeepEP
Support Blackwell with FFA_FA4 Backend
Support Learnable Attention Sink
Support Muon QK-Clip
Optimize Sparse Attention in FFA
Dynamic Attention Solver
How to Ensure Kernels Actually Overlapped
Distributed-Native FFA
Attention Engine for Inference
Flash Attention 2 Math Derivation
Support JIT Compilation in FFA

© Copyright 2025-2026, Sandai.

Created using Sphinx 9.1.0.

Built with the PyData Sphinx Theme 0.16.1.