You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I noticed a mistake in the description of cross attention in 2.3. Attention in LLMs:
Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.
In reality it is the other way around, the queries come from the decoder self-attention, while the encoder outputs act as key and value. See also here or here, chapter 3.2.3:
In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and the memory keys and values come from the output of the encoder.
The text was updated successfully, but these errors were encountered:
Hi, I noticed a mistake in the description of cross attention in 2.3. Attention in LLMs:
In reality it is the other way around, the queries come from the decoder self-attention, while the encoder outputs act as key and value. See also here or here, chapter 3.2.3:
The text was updated successfully, but these errors were encountered: