Attention Pooling

astonzhang · December 23, 2020, 3:03am

https://d2l.ai/chapter_attention-mechanisms-and-transformers/attention-pooling.html

wuncc1 · March 31, 2021, 1:30am

Hi,
It’s Watson (not Waston).
(Otherwise, enjoying the book).

A better idea was proposed by Nadaraya [Nadaraya, 1964] and Waston

StevenJokess · March 31, 2021, 3:34am

You are right! @wuncc1

I’m pulling.Next time you can contribute by yourself, check the tutorial

MRAB · May 11, 2021, 11:55pm

Hello,

Will tensorflow code be availbe for these sections as well (and in other chapters that miss them)?

joegenius98 · January 17, 2022, 9:26pm

Hello, everyone, I hope everyone’s having a good day.

I was wondering why the circled statement is true. Because both the denominator and numerator are negative, the end-result fraction is actually positive. So, should it not be the farther away the query x is, the higher the attention weight?

TusakaRin · March 6, 2022, 11:37am

The denominator and numerator are both positive, because the negative square is passed into exp(). Larger different between xi and x will lead to larger squared difference, and then smaller exp value, and finally smaller weights.