https://d2l.ai/chapter_natural-language-processing-pretraining/word2vec-pretraining.html
What exactly nn.embbeding layer did? it seems just a linear layer. then the skip_gram() function looks different with the paper showed flow. I did not understand this part. I saw some skip_gram implementation and this one is exactly same with the paper showed flow: https://www.kaggle.com/karthur10/skip-gram-implementation-with-pytorch-step-by-step
The nn.embedding is a kind of lookup table, it will return the weights, then the implementation is same as the paper.
There is a typo in cell 2: print(f’Parameter embedding_weight ({embed.weight.shape}, ’
‘dtype={embed.weight.dtype})’)
should be: print(f’Parameter embedding_weight ({embed.weight.shape}, ’
f’dtype={embed.weight.dtype})’)
Currently embed.weight.dtype is being treated as a string and is printed as it is
This is the first time I see such kind of style in PyTorch. Could you please update the code for a more readable form?: For selecting the device, just type torch.device( “cuda” if torch.cuda.is_availbale() else “cpu”); In train function, there is a closure to initialize weights in Xavier initialization form, but this is already implemented as a default in PyTorch. Then d2l.Accumulator(2) that’s a little bit confusing I like the book, but it would be better to have a more PyTorch style.
Got the error: AttributeError: Can’t pickle local object ‘load_data_ptb..PTBDataset’
for cell: lr, num_epochs = 0.002, 5
train(net, data_iter, lr, num_epochs).
Anyone has the same problem?
I don’t understand loss(pred, label, mask) * mask.shape[1] / mask.sum(axis=1), why we need * mask.shape[1] / mask.sum(axis=1)?
n the function ‘load_data_ptb’ , set ‘num_workers=0’, then re-generate data_iter will help.
this is kind of mask affected column mean
some questions:
a. how to metric ‘semantically similarity’
b. why uses cosine to do similarity rather than ‘dot product’ ? the latter is used in training step, and both has same intersection , but with different set