Attention Cues

https://d2l.ai/chapter_attention-mechanisms-and-transformers/attention-cues.html

Great book!
And about the answer of excersize 1,
1. What canbe the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?
I think the sensory inputs are embeddings of tokens, and what is the volutional cue ?

The nonvolitional cues are the sourcing input sequence.
The volitional cues are the predicted/translated output sequence of each time.
The sensory inputs are the target language dictionary.

attention_weights = torch.rand(10, 10) # generate 10x10 random matrix
m = torch.nn.Softmax(dim=0) # softmax on dim0
out = m(attention_weights) # apply softmax

# the following two lines are simply borrowed from the example
attention_weights = out.reshape((1, 1, 10, 10))
show_heatmaps(attention_weights, xlabel='Keys', ylabel='Queries')

I think the volitional cue here is the decoder state in each time step, the sensory inputs are the hidden states from each encoding time step . You bias the selection by weighted average of values ( hidden state from each encoding time step) for a given volitional cues ( i.e. the hidden state of the decoder at given time step)

In my point of view, I think the encoder hidden state could become the volitional cue(queries), and the hidden states of the decoder are the non-volitional cues. I think the encoder hidden state stores the information of the original input text, so this information should be able to bias the hidden state of the decoder. As a result, decoder can focus on hidden states that make sense according to the encoder hidden states.

Question 1.
What can be the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?

The volitional cues (queries) are the desired “words” for conversion. (e.g. “egg”)

The nonvolitional cues (keys) are the training input “words” paired with the output words. (e.g “egg in egg-huevo pair (huevo is the Spanish word for egg))”

The sensory inputs (values are the training output “words” paired with the input words. (e.g “huevo in egg-huevo pair (huevo is the Spanish word for egg))”

Hi all, I noticed the content about the volitional cues, etc. have been deleted in the new version. Just wondering if the figurative concepts still hold, and if so, where can I get the previous version? Thanks!

About the exercise 1, because the algorithm of approximate matches is based on distance between records, I think we can use distance functions. For instance, we can use Levenshtein distance for strings.

Q1:
I have coded a very naive Leveinstein string comparing:

import math
import numpy as np

D = {'pagina':'livro', 'cidade':'pais', 'paginacao':'encadernacao'}


def lev(u, v, i, j):
    if min(i, j) == 0:
        return max(i, j)
    else:
        r1 = lev(u, v, i-1, j)
        r2 = lev(u, v, i, j-1)
        r3 = lev(u, v, i-1, j-1) + (0 if u[i] == v[j] else 1)
        return min(r1, r2, r3)

def comp(u, v):
    """
    The levenshtein distante between u and v
    """
    return lev(u, v, len(u)-1, len(v)-1)

def AttentionWeights(q, D):
    wcomp = np.zeros(len(D))
    for i, (k, v) in enumerate(D.items()):
        wcomp[i] = comp(q, k)
    return 1 - wcomp/np.sum(wcomp)
print(AttentionWeights("pagina", D))

My solutions to the exs: 11.1

1 Like