Adjust MFU to account for FP8 #560

lessw2020 · 2024-08-23T22:04:34Z

From internal discussions, logging an issue around updating our MFU calculations so that if FP8 is used, we can generate an accurate MFU number.

Atm - FP8 replaces wq/wk/wv/wo in Attention, and w1/w2/w3 in the MLP.

Thus, need an adjusted calculation.

In addition, would like to correctly pull the proper MFU (fp8 or bf16) based on the training config being run so this is handled automatically for the user.

raghukiran1224 · 2024-09-08T11:57:36Z

yes, this needs to be updated. The MFU computations for fp8 are too good to be true :)
CC: @lchu-ibm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust MFU to account for FP8 #560

Adjust MFU to account for FP8 #560

lessw2020 commented Aug 23, 2024

raghukiran1224 commented Sep 8, 2024

Adjust MFU to account for FP8 #560

Adjust MFU to account for FP8 #560

Comments

lessw2020 commented Aug 23, 2024

raghukiran1224 commented Sep 8, 2024