Skip to main content

Ctrl+K

MagiAttention

User Guide
Blogs

Github
Blog

User Guide
Blogs

Github
Blog

Recent Posts

15 February - How to Ensure Kernels Actually Overlapped
14 February - Distributed-Native FFA
08 February - Attention Engine for Inference
07 February - Support Blackwell with FFA_FA4 Backend
04 February - Support Muon QK-Clip

Tags

AF Disaggregation
Attention Sink
Attention Slice Representation
Benchmark
Blackwell
Computation Load-Balance
Computation-Communication Overlap
Context Parallelism
DSA
DeepEP
Distributed Attention
Dynamic Load Balance
Flash-Attention
Flex-Flash-Attention
Group Collective
HSTU Function Representation
Hybrid Attention
Multi-Stage Overlap
Muon
NSA
QK-Clip
Sparse Attention
Zero-Redundant Communication

Categories

MagiAttention (12)

Archives

2026 (8)
2025 (4)

Authors

Bowen Zeng (3)
Hanwen Sun (3)
Jerry Chen (1)
Jin Li (4)
Kunlun Li (1)
Qiangang Wang (4)
Tao Bu (2)
Yufeng Yang (1)
Yujia Liu (1)
Yunpeng Huang (11)
Zewei Tao (7)

Languages

English (12)

Locations

China (12)

Posts tagged Computation Load-Balance

Posts tagged Computation Load-Balance

MagiAttention

21 April 2025

Zewei Tao , Yunpeng Huang , Qiangang Wang , Hanwen Sun , Jin Li , Tao Bu , Bowen Zeng

China

English

MagiAttention

Attention Slice Representation Computation Load-Balance Zero-Redundant Communication Multi-Stage Overlap Flex-Flash-Attention Group Collective Flash-Attention Distributed Attention Context Parallelism

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training

Read more ...

© Copyright 2025-2026, Sandai.

Created using Sphinx 9.1.0.

Built with the PyData Sphinx Theme 0.16.1.