https://d2l.ai/chapter_attention-mechanisms-and-transformers/attention-cues.html

Great book!

And about the answer of excersize 1,

`1. What canbe the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?`

I think the sensory inputs are embeddings of tokens, and what is the volutional cue ?

The nonvolitional cues are the sourcing input sequence.

The volitional cues are the predicted/translated output sequence of each time.

The sensory inputs are the target language dictionary.

```
attention_weights = torch.rand(10, 10) # generate 10x10 random matrix
m = torch.nn.Softmax(dim=0) # softmax on dim0
out = m(attention_weights) # apply softmax
# the following two lines are simply borrowed from the example
attention_weights = out.reshape((1, 1, 10, 10))
show_heatmaps(attention_weights, xlabel='Keys', ylabel='Queries')
```

I think the volitional cue here is the decoder state in each time step, the sensory inputs are the hidden states from each encoding time step . You bias the selection by weighted average of values ( hidden state from each encoding time step) for a given volitional cues ( i.e. the hidden state of the decoder at given time step)

In my point of view, I think the encoder hidden state could become the volitional cue(queries), and the hidden states of the decoder are the non-volitional cues. I think the encoder hidden state stores the information of the original input text, so this information should be able to bias the hidden state of the decoder. As a result, decoder can focus on hidden states that make sense according to the encoder hidden states.

**Question 1.**

What can be the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?

The **volitional cues** (queries) are the desired “words” for conversion. (e.g. “egg”)

The **nonvolitional cues** (keys) are the training input “words” paired with the output words. (e.g “egg in egg-huevo pair (huevo is the Spanish word for egg))”

The **sensory inputs** (values are the training output “words” paired with the input words. (e.g “huevo in egg-huevo pair (huevo is the Spanish word for egg))”

Hi all, I noticed the content about the volitional cues, etc. have been deleted in the new version. Just wondering if the figurative concepts still hold, and if so, where can I get the previous version? Thanks!

About the exercise 1, because the algorithm of approximate matches is based on distance between records, I think we can use distance functions. For instance, we can use Levenshtein distance for strings.

Q1:

I have coded a very naive Leveinstein string comparing:

```
import math
import numpy as np
D = {'pagina':'livro', 'cidade':'pais', 'paginacao':'encadernacao'}
def lev(u, v, i, j):
if min(i, j) == 0:
return max(i, j)
else:
r1 = lev(u, v, i-1, j)
r2 = lev(u, v, i, j-1)
r3 = lev(u, v, i-1, j-1) + (0 if u[i] == v[j] else 1)
return min(r1, r2, r3)
def comp(u, v):
"""
The levenshtein distante between u and v
"""
return lev(u, v, len(u)-1, len(v)-1)
def AttentionWeights(q, D):
wcomp = np.zeros(len(D))
for i, (k, v) in enumerate(D.items()):
wcomp[i] = comp(q, k)
return 1 - wcomp/np.sum(wcomp)
print(AttentionWeights("pagina", D))
```