Attention Pooling: Nadaraya-Watson Kernel Regression

It’s Watson (not Waston).
(Otherwise, enjoying the book).

A better idea was proposed by Nadaraya [Nadaraya, 1964] and Waston

You are right! @wuncc1

I’m pulling.Next time you can contribute by yourself, check the tutorial


Will tensorflow code be availbe for these sections as well (and in other chapters that miss them)?