Pretraining word2vec

https://d2l.ai/chapter_natural-language-processing-pretraining/word2vec-pretraining.html

What exactly nn.embbeding layer did? it seems just a linear layer. then the skip_gram() function looks different with the paper showed flow. I did not understand this part. I saw some skip_gram implementation and this one is exactly same with the paper showed flow: https://www.kaggle.com/karthur10/skip-gram-implementation-with-pytorch-step-by-step

The nn.embedding is a kind of lookup table, it will return the weights, then the implementation is same as the paper.

There is a typo in cell 2: print(f’Parameter embedding_weight ({embed.weight.shape}, ’
‘dtype={embed.weight.dtype})’)

should be: print(f’Parameter embedding_weight ({embed.weight.shape}, ’
f’dtype={embed.weight.dtype})’)

Currently embed.weight.dtype is being treated as a string and is printed as it is

This is the first time I see such kind of style in PyTorch. Could you please update the code for a more readable form?: For selecting the device, just type torch.device( “cuda” if torch.cuda.is_availbale() else “cpu”); In train function, there is a closure to initialize weights in Xavier initialization form, but this is already implemented as a default in PyTorch. Then d2l.Accumulator(2) that’s a little bit confusing :wink: I like the book, but it would be better to have a more PyTorch style.

Got the error: AttributeError: Can’t pickle local object ‘load_data_ptb..PTBDataset’
for cell: lr, num_epochs = 0.002, 5
train(net, data_iter, lr, num_epochs).

Anyone has the same problem?

I don’t understand loss(pred, label, mask) * mask.shape[1] / mask.sum(axis=1), why we need * mask.shape[1] / mask.sum(axis=1)?

n the function ‘load_data_ptb’ , set ‘num_workers=0’, then re-generate data_iter will help.

this is kind of mask affected column mean

My solutions to the exs: 15.4