Attention Cues

Great book!
And about the answer of excersize 1,
1. What canbe the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?
I think the sensory inputs are embeddings of tokens, and what is the volutional cue ?

The nonvolitional cues are the sourcing input sequence.
The volitional cues are the predicted/translated output sequence of each time.
The sensory inputs are the target language dictionary.

attention_weights = torch.rand(10, 10) # generate 10x10 random matrix
m = torch.nn.Softmax(dim=0) # softmax on dim0
out = m(attention_weights) # apply softmax

# the following two lines are simply borrowed from the example
attention_weights = out.reshape((1, 1, 10, 10))
show_heatmaps(attention_weights, xlabel='Keys', ylabel='Queries')