Attention Cues

astonzhang · December 22, 2020, 1:47am

https://d2l.ai/chapter_attention-mechanisms-and-transformers/attention-cues.html

Terguun_Zoregtiin · July 23, 2021, 6:52am

Great book!
And about the answer of excersize 1,
1. What canbe the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?
I think the sensory inputs are embeddings of tokens, and what is the volutional cue ?

zppet · September 6, 2021, 9:42am

The nonvolitional cues are the sourcing input sequence.
The volitional cues are the predicted/translated output sequence of each time.
The sensory inputs are the target language dictionary.

captainst · September 23, 2021, 2:10pm

attention_weights = torch.rand(10, 10) # generate 10x10 random matrix
m = torch.nn.Softmax(dim=0) # softmax on dim0
out = m(attention_weights) # apply softmax

# the following two lines are simply borrowed from the example
attention_weights = out.reshape((1, 1, 10, 10))
show_heatmaps(attention_weights, xlabel='Keys', ylabel='Queries')

Bay_Leaf · October 7, 2021, 1:45am

I think the volitional cue here is the decoder state in each time step, the sensory inputs are the hidden states from each encoding time step . You bias the selection by weighted average of values ( hidden state from each encoding time step) for a given volitional cues ( i.e. the hidden state of the decoder at given time step)

Isaacwu0718 · May 13, 2022, 6:01am

In my point of view, I think the encoder hidden state could become the volitional cue(queries), and the hidden states of the decoder are the non-volitional cues. I think the encoder hidden state stores the information of the original input text, so this information should be able to bias the hidden state of the decoder. As a result, decoder can focus on hidden states that make sense according to the encoder hidden states.

CristoJV · October 10, 2022, 4:59pm

Question 1.
What can be the volitional cue when decoding a sequence token by token in machine translation? What are the nonvolitional cues and the sensory inputs?

The volitional cues (queries) are the desired “words” for conversion. (e.g. “egg”)

The nonvolitional cues (keys) are the training input “words” paired with the output words. (e.g “egg in egg-huevo pair (huevo is the Spanish word for egg))”

The sensory inputs (values are the training output “words” paired with the input words. (e.g “huevo in egg-huevo pair (huevo is the Spanish word for egg))”

superduper · December 30, 2022, 12:25am

Hi all, I noticed the content about the volitional cues, etc. have been deleted in the new version. Just wondering if the figurative concepts still hold, and if so, where can I get the previous version? Thanks!

tan · January 27, 2023, 8:19am

About the exercise 1, because the algorithm of approximate matches is based on distance between records, I think we can use distance functions. For instance, we can use Levenshtein distance for strings.

gilzamir18 · April 25, 2023, 1:02pm

Q1:
I have coded a very naive Leveinstein string comparing:

import math
import numpy as np

D = {'pagina':'livro', 'cidade':'pais', 'paginacao':'encadernacao'}


def lev(u, v, i, j):
    if min(i, j) == 0:
        return max(i, j)
    else:
        r1 = lev(u, v, i-1, j)
        r2 = lev(u, v, i, j-1)
        r3 = lev(u, v, i-1, j-1) + (0 if u[i] == v[j] else 1)
        return min(r1, r2, r3)

def comp(u, v):
    """
    The levenshtein distante between u and v
    """
    return lev(u, v, len(u)-1, len(v)-1)

def AttentionWeights(q, D):
    wcomp = np.zeros(len(D))
    for i, (k, v) in enumerate(D.items()):
        wcomp[i] = comp(q, k)
    return 1 - wcomp/np.sum(wcomp)
print(AttentionWeights("pagina", D))

pandalabme · September 9, 2023, 7:28am

My solutions to the exs: 11.1