Skip to main content
Ctrl+K

MagiAttention

  • User Guide
  • Blogs
  • Github
  • Blog
  • User Guide
  • Blogs
  • Github
  • Blog

Recent Posts

  • 15 February - How to Ensure Kernels Actually Overlapped
  • 14 February - Distributed-Native FFA
  • 08 February - Attention Engine for Inference
  • 07 February - Support Blackwell with FFA_FA4 Backend
  • 04 February - Support Muon QK-Clip

Tags

  • AF Disaggregation
  • Attention Sink
  • Attention Slice Representation
  • Benchmark
  • Blackwell
  • Computation Load-Balance
  • Computation-Communication Overlap
  • Context Parallelism
  • DSA
  • DeepEP
  • Distributed Attention
  • Dynamic Load Balance
  • Flash-Attention
  • Flex-Flash-Attention
  • Group Collective
  • HSTU Function Representation
  • Hybrid Attention
  • Multi-Stage Overlap
  • Muon
  • NSA
  • QK-Clip
  • Sparse Attention
  • Zero-Redundant Communication

Categories

  • MagiAttention (12)

Archives

  • 2026 (8)
  • 2025 (4)

Authors

  • Bowen Zeng (3)
  • Hanwen Sun (3)
  • Jerry Chen (1)
  • Jin Li (4)
  • Kunlun Li (1)
  • Qiangang Wang (4)
  • Tao Bu (2)
  • Yufeng Yang (1)
  • Yujia Liu (1)
  • Yunpeng Huang (11)
  • Zewei Tao (7)

Languages

  • English (12)

Locations

  • China (12)
  • Posts by Yufeng Yang

Posts by Yufeng Yang

Support Blackwell with FFA_FA4 Backend

  • 07 February 2026
  • Jerry Chen , Yujia Liu , Yufeng Yang , Yunpeng Huang , Zewei Tao , Qiangang Wang , Kunlun Li
  • China
  • English
  • MagiAttention
  • Blackwell Flex-Flash-Attention Flash-Attention HSTU Function Representation

The upcoming blog post will be released in the near future. Stay tuned!

Read more ...


© Copyright 2025-2026, Sandai.

Created using Sphinx 9.1.0.

Built with the PyData Sphinx Theme 0.16.1.