Skip to main content
Ctrl+K

MagiAttention

  • User Guide
  • Blogs
  • Github
  • Blog
  • User Guide
  • Blogs
  • Github
  • Blog

Recent Posts

  • 15 February - How to Ensure Kernels Actually Overlap
  • 14 February - Distributed-Native FFA (Coming Soon)
  • 08 February - Attention Engine for Inference (Coming Soon)
  • 07 February - Support Blackwell with FFA_FA4 Backend
  • 04 February - Support Muon QK-Clip

Tags

  • AF Disaggregation
  • Attention Sink
  • Attention Slice Representation
  • Benchmark
  • Blackwell
  • Collective Communication
  • Computation Load-Balance
  • Computation-Communication Overlap
  • Context Parallelism
  • DSA
  • DeepEP
  • Distributed Attention
  • Dynamic Load Balance
  • Flash-Attention
  • Flex-Flash-Attention
  • Group Collective
  • HSTU Function Representation
  • Hybrid Attention
  • Multi-Stage Overlap
  • Muon
  • NSA
  • QK-Clip
  • Sparse Attention
  • Zero-Redundant Communication

Categories

  • MagiAttention (12)

Archives

  • 2026 (8)
  • 2025 (4)

Authors

  • Bowen Zeng (3)
  • Hanwen Sun (3)
  • Jerry Chen (1)
  • Jin Li (4)
  • Kunlun Li (1)
  • Qiangang Wang (3)
  • Tao Bu (2)
  • Yufeng Yang (1)
  • Yujia Liu (1)
  • Yunpeng Huang (11)
  • Zewei Tao (8)

Languages

  • English (12)

Locations

  • China (12)
  • Posts tagged DSA

Posts tagged DSA

Optimize Sparse Attention in FFA (Coming Soon)

  • 25 January 2026
  • Zewei Tao , Hanwen Sun , Bowen Zeng , Jin Li , Yunpeng Huang
  • China
  • English
  • MagiAttention
  • Sparse Attention NSA DSA Flex-Flash-Attention Flash-Attention

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...


© Copyright 2025-2026, Sandai.

Created using Sphinx 9.1.0.

Built with the PyData Sphinx Theme 0.16.1.