Installation#

Warning

MagiAttention currently supports only Hopper and Blackwell. We are actively working to support more GPU architectures in upcoming releases.

Setup Environment#

Activate an NGC-PyTorch Container#

Tip

We recommend you to use the standard NGC-PyTorch Docker Releases for consistency of basic dependencies such as Python, CUDA, PyTorch, etc.

  • docker run command:

    # choose one compatible version
    MAJOR_VERSION=25
    MINOR_VERSION=10 # choose from {05, 06, 08, 09, 10}
    
    # specify your own names and paths
    CONTAINER_NAME=...
    HOST_MNT_ROOT=...
    CONTAINER_MNT_ROOT=...
    
    docker run --name ${CONTAINER_NAME} -v ${HOST_MNT_ROOT}:${CONTAINER_MNT_ROOT} -it -d --privileged --gpus all --network host --ipc host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/pytorch:${MAJOR_VERSION}.${MINOR_VERSION}-py3 /bin/bash
    
  • docker exec command:

    docker exec -it ${CONTAINER_NAME} /bin/bash
    

Pull Source Code#

  • git commands:

    git clone https://github.com/SandAI-org/MagiAttention.git
    
    cd MagiAttention
    
    git submodule update --init --recursive
    

Enable IBGDA (optional)#

Note

If you would like to try using our native group-collective kernels when cp_size > 8 as the communication backend, i.e. a process group involving both intranode (connected through NVLink) and internode (visible through RDMA) peers, you’re required to enable IBGDA on your bare-metal host machine.

Warning

This step needs to be performed on the BARE-METAL HOST OPERATING SYSTEM, NOT inside a Docker or other containerized environment, as containers do not manage the host kernel.

  • bash script:

    bash scripts/enable_ibgda_on_host.sh
    

Setup Dependencies#

Install Required Packages#

  • pip install command:

    pip install -r requirements.txt
    

Install flash_attn_cute (optional)#

Note

If you would like to try MagiAttention on Blackwell, for now you’re required to install flash_attn_cute package to enable FFA_FA backend as a temporary workaround.

  • bash script:

    bash scripts/install_flash_attn_cute.sh
    

Install MagiAttention#

Install MagiAttention From Source#

Warning

This progress may take around 10~20 minutes and occupies 90% of CPU resources for the first time.

Note

We have several environment variables to fine-grained control the installation progress, especially for CUDA extension modules building.

  • pip install command for Hopper:

    pip install --no-build-isolation .
    
  • pip install command for Blackwell:

    export MAGI_ATTENTION_PREBUILD_FFA=0
    pip install --no-build-isolation .
    
    export MAGI_ATTENTION_FA4_BACKEND=1 # always set it when using MagiAttention on Blackwell
    

PreCompile FFA_FA4 kernels (optional)#

Note

If you would like to try MagiAttention on Blackwell and you’ve already installed both magi_attention and flash_attn_cute to enable FFA_FA backend, we further recommend you to pre-compile the common cases for FFA_FA4 kernels before production usage to avoid runtime JIT re-compilation overhead, since it is built upon CuteDSL.

And the cache directory for pre-compiled kernels is /path/to/magi_attention/lib/ffa_fa4_cache/ by default, which can be overridden by setting the environment variable MAGI_ATTENTION_FFA_FA4_CACHE_DIR to specify a custom cache directory if needed.

  • python script:

    # You can change the cases to pre-compile in the script according to your needs,
    # and the whole pre-compilation progress will be richly logged
    # in the terminal by tqdm, for you to track the progress and results.
    python tools/precompile_ffa_fa4.py