Mistake in Chapter 2 - Background #2

cornzz · 2024-07-11T14:13:57Z

Hi, I noticed a mistake in the description of cross attention in 2.3. Attention in LLMs:

Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.

In reality it is the other way around, the queries come from the decoder self-attention, while the encoder outputs act as key and value. See also here or here, chapter 3.2.3:

In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and the memory keys and values come from the output of the encoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistake in Chapter 2 - Background #2

Mistake in Chapter 2 - Background #2

cornzz commented Jul 11, 2024 •

edited

Loading

Mistake in Chapter 2 - Background #2

Mistake in Chapter 2 - Background #2

Comments

cornzz commented Jul 11, 2024 • edited Loading

cornzz commented Jul 11, 2024 •

edited

Loading