Environment Variables#

In MagiAttention, many features need to be configured through environment variables. Below are some environment variables that can be set, along with their descriptions.

For Performance#

MAGI_ATTENTION_HIERARCHICAL_COMM

Toggling MAGI_ATTENTION_HIERARCHICAL_COMM env variable to 1 to enable hierarchical group-collective comm within 2-dim cp group (inter_node group + intra_node group).

Note

This is for now a temporary solution to reduce the redundant inter-node communication and might be removed or updated in the future.

MAGI_ATTENTION_FFA_FORWARD_SM_MARGIN

The sm margin number of ffa forward kernel saved for comm kernels.

MAGI_ATTENTION_FFA_BACKWARD_SM_MARGIN

The sm margin number of ffa backward kernel saved for comm kernels.

MAGI_ATTENTION_FFA_FORWARD_INPLACE_CORRECT

Toggling this env variable to 1 can enable inplace-correct for out and lse in ffa forward to avoid the storage of partial results and the memory-bound result_correction as a forward post process.

Note

This feature will be enabled by default as long as it’s stable (i.e. no effect on accuracy or performance).

MAGI_ATTENTION_FFA_BACKWARD_HIGH_PRECISION_REDUCE

Toggling this env variable to 1 can enable high-precision (fp32) reduce for dkv among ranks in ffa backward to increase the precision at the cost of double comm overhead。

Note

Inside the ffa backward kernel, we always use high-precision (fp32) accumulation for partial dkv. However, by default we will downcast it to kv dtype before reducing among ranks to decrease comm overhead.

MAGI_ATTENTION_DIST_ATTN_RUNTIME_DICT_SIZE

Modify the value of this env variable to change the size of dist_attn_runtime_dict. The default value is 100. See magi_attention.api.magi_attn_interface.py for more information.

For Debug#

MAGI_ATTENTION_SANITY_CHECK

Toggling MAGI_ATTENTION_SANITY_CHECK env variable to 1 can enable many sanity check codes inside magi_attention.

Note

This is only supposed to be used for testing or debugging, since the extra sanity-check overhead might be non-negligible.

MAGI_ATTENTION_SDPA_BACKEND

Toggling MAGI_ATTENTION_SDPA_BACKEND env variable to 1 can switch the attn kernel backend from ffa to sdpa-math, to support higher precision like fp32, fp64.

Note

This is only supposed to be used for testing or debugging, since the performance is not acceptable.

MAGI_ATTENTION_DETERMINISTIC_MODE

Toggle MAGI_ATTENTION_DETERMINISTIC_MODE env variable to 1 to enable deterministic mode to use deterministic algorithms for all magi_attention kernels.