Installation#
Warning
MagiAttention currently supports only Hopper and Blackwell. We are actively working to support more GPU architectures in upcoming releases.
Setup Environment#
Activate an NGC-PyTorch Container#
Tip
We recommend you to use the standard NGC-PyTorch Docker Releases for consistency of basic dependencies such as Python, CUDA, PyTorch, etc.
Warning
Due to performance issue caused by CUDA-12, we recommend you to use CUDA-13+ based NGC-PyTorch containers for optimal performance.
And we add an assertion in the setup.py script to check the CUDA version and abort the installation if the CUDA version is lower than 13.0.
If you insist on using CUDA-12 based containers, you can set the environment variable MAGI_ATTENTION_ALLOW_BUILD_WITH_CUDA12=1, but please be aware that it may lead to significant performance degradation compared to CUDA-13+.
docker run command:
# choose one compatible version MAJOR_VERSION=25 MINOR_VERSION=10 # specify your own names and paths CONTAINER_NAME=... HOST_MNT_ROOT=... CONTAINER_MNT_ROOT=... docker run --name ${CONTAINER_NAME} -v ${HOST_MNT_ROOT}:${CONTAINER_MNT_ROOT} -it -d --privileged --gpus all --network host --ipc host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/pytorch:${MAJOR_VERSION}.${MINOR_VERSION}-py3 /bin/bash
docker exec command:
docker exec -it ${CONTAINER_NAME} /bin/bash
Pull Source Code#
git commands:
git clone https://github.com/SandAI-org/MagiAttention.git cd MagiAttention git submodule update --init --recursive
Enable IBGDA (optional)#
Note
If you would like to try using our native group-collective kernels when cp_size > 8 as the communication backend, i.e. a process group involving both intranode (connected through NVLink) and internode (visible through RDMA) peers, you’re required to enable IBGDA on your bare-metal host machine.
Warning
This step needs to be performed on the BARE-METAL HOST OPERATING SYSTEM, NOT inside a Docker or other containerized environment, as containers do not manage the host kernel.
bash script:
bash scripts/enable_ibgda_on_host.sh
Setup Dependencies#
Install Required Packages#
pip install command:
pip install -r requirements.txt
Install flash_attn_cute (optional)#
Note
If you would like to try MagiAttention on Blackwell, for now you’re required to install flash_attn_cute package to enable FFA_FA backend as a temporary workaround.
bash script:
bash scripts/install_flash_attn_cute.sh
Install MagiAttention#
Install MagiAttention From Source#
Warning
This progress may take around 10~20 minutes and occupies 90% of CPU resources for the first time.
Note
We have several environment variables to fine-grained control the installation progress, especially for CUDA extension modules building.
pip install command for Hopper:
pip install --no-build-isolation .
pip install command for Blackwell:
export MAGI_ATTENTION_PREBUILD_FFA=0 pip install --no-build-isolation . export MAGI_ATTENTION_FA4_BACKEND=1 # always set it when using MagiAttention on Blackwell
PreCompile FFA_FA4 kernels (optional)#
Note
If you would like to try MagiAttention on Blackwell and you’ve already installed both magi_attention and flash_attn_cute to enable FFA_FA backend, we further recommend you to pre-compile the common cases for FFA_FA4 kernels before production usage to avoid runtime JIT re-compilation overhead, since it is built upon Cute PythonDSL.
And the cache directory for pre-compiled kernels is /path/to/magi_attention/lib/ffa_fa4_cache/ by default, which can be overridden by setting the environment variable MAGI_ATTENTION_FFA_FA4_CACHE_DIR to specify a custom cache directory if needed.
python script:
# You can change the cases to pre-compile in the script according to your needs, # and the whole pre-compilation progress will be richly logged # in the terminal by tqdm, for you to track the progress and results. python tools/precompile_ffa_fa4.py