Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types. #2564

harishankar-gopalan · 2023-09-25T07:10:01Z

Given that from PyTorch 2.0 the dynamic dispatch to FlashAttention happens if the required conditions satisfy, I do not find a way to ensure whether FlashAttention is used by default. Also due to the HF dependency for general GPT recipes, which do not seem to use the F.scaled_dot_product_attention method of PyTorch, I am wondering if FlashAttention will really be used while using composer. Any ideas on how to easily enabled usage of FlashAttention while using HF model along with composer ?

The text was updated successfully, but these errors were encountered:

snarayan21 · 2023-09-25T20:38:55Z

Hey, we'd recommend that you use our llm-foundry repo, which uses composer extensively and also supports using HF models. Check it out here!

harishankar-gopalan · 2023-09-26T23:43:29Z

Hey, we'd recommend that you use our llm-foundry repo, which uses composer extensively and also supports using HF models. Check it out here!

Hi @snarayan21 thanks for the response. This however does not answer my original question. Even in LLM foundry if we are using HuggingFace for model recipes, I do not see a functionality where the attention layer computation is ensured to go via 'F.scaled_dot_product_attention' method of PyTorch which is what ensures to dispatch to either FlashAttention and MemEfficientAttention if possible for the current model parameters. Any insights into this ?

snarayan21 · 2023-09-27T23:50:03Z

Hey so there are three cases you'll have when using llm-foundry:

First, using an MPT model. This has configurable attention, and supports flash attention.
Second, using a Llama model. There is an option to patch in flash attention as configured in llm-foundry.
Third, using a HuggingFace model. Foundry will use whatever attention implementation the underlying HuggingFace model uses.

You can see our attention implementations in foundry in this folder. Hope this helps!

harishankar-gopalan added the enhancement New (engineering) enhancements, such as features or API changes. label Sep 25, 2023

snarayan21 closed this as completed Sep 25, 2023

snarayan21 reopened this Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types. #2564

Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types. #2564

harishankar-gopalan commented Sep 25, 2023

snarayan21 commented Sep 25, 2023

harishankar-gopalan commented Sep 26, 2023 •

edited

Loading

snarayan21 commented Sep 27, 2023

Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types. #2564

Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types. #2564

Comments

harishankar-gopalan commented Sep 25, 2023

snarayan21 commented Sep 25, 2023

harishankar-gopalan commented Sep 26, 2023 • edited Loading

snarayan21 commented Sep 27, 2023

harishankar-gopalan commented Sep 26, 2023 •

edited

Loading