Skip to main content
Ctrl+K

MagiAttention

  • User Guide
  • Blogs
  • Github
  • Blog
  • User Guide
  • Blogs
  • Github
  • Blog

Recent Posts

  • 15 February - How to Ensure Kernels Actually Overlapped
  • 14 February - Distributed-Native FFA
  • 08 February - Attention Engine for Inference
  • 07 February - Support Blackwell with FFA_FA4 Backend
  • 04 February - Support Muon QK-Clip

Tags

  • AF Disaggregation
  • Attention Sink
  • Attention Slice Representation
  • Benchmark
  • Blackwell
  • Computation Load-Balance
  • Computation-Communication Overlap
  • Context Parallelism
  • DSA
  • DeepEP
  • Distributed Attention
  • Dynamic Load Balance
  • Flash-Attention
  • Flex-Flash-Attention
  • Group Collective
  • HSTU Function Representation
  • Hybrid Attention
  • Multi-Stage Overlap
  • Muon
  • NSA
  • QK-Clip
  • Sparse Attention
  • Zero-Redundant Communication

Categories

  • MagiAttention (12)

Archives

  • 2026 (8)
  • 2025 (4)

Authors

  • Bowen Zeng (3)
  • Hanwen Sun (3)
  • Jerry Chen (1)
  • Jin Li (4)
  • Kunlun Li (1)
  • Qiangang Wang (4)
  • Tao Bu (2)
  • Yufeng Yang (1)
  • Yujia Liu (1)
  • Yunpeng Huang (11)
  • Zewei Tao (7)

Languages

  • English (12)

Locations

  • China (12)
  • Posts tagged Computation Load-Balance

Posts tagged Computation Load-Balance

MagiAttention

  • 21 April 2025
  • Zewei Tao , Yunpeng Huang , Qiangang Wang , Hanwen Sun , Jin Li , Tao Bu , Bowen Zeng
  • China
  • English
  • MagiAttention
  • Attention Slice Representation Computation Load-Balance Zero-Redundant Communication Multi-Stage Overlap Flex-Flash-Attention Group Collective Flash-Attention Distributed Attention Context Parallelism

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Mask Training

Read more ...


© Copyright 2025-2026, Sandai.

Created using Sphinx 9.1.0.

Built with the PyData Sphinx Theme 0.16.1.