Attention Pooling - mxnet - D2L Discussion

Mar '21

wuncc1

Hi,
It’s Watson (not Waston).
(Otherwise, enjoying the book).

A better idea was proposed by Nadaraya [Nadaraya, 1964] and Waston

1 reply

Mar '21

StevenJokess

You are right! @wuncc1

I’m pulling.Next time you can contribute by yourself, check the tutorial

May '21

MRAB

Hello,

Will tensorflow code be availbe for these sections as well (and in other chapters that miss them)?

Jan '22

joegenius98

Hello, everyone, I hope everyone’s having a good day.

I was wondering why the circled statement is true. Because both the denominator and numerator are negative, the end-result fraction is actually positive. So, should it not be the farther away the query x is, the higher the attention weight?

1 reply

Mar '22 ▶ joegenius98

TusakaRin

The denominator and numerator are both positive, because the negative square is passed into exp(). Larger different between xi and x will lead to larger squared difference, and then smaller exp value, and finally smaller weights.