Pretraining word2vec

What exactly nn.embbeding layer did? it seems just a linear layer. then the skip_gram() function looks different with the paper showed flow. I did not understand this part. I saw some skip_gram implementation and this one is exactly same with the paper showed flow:

The nn.embedding is a kind of lookup table, it will return the weights, then the implementation is same as the paper.

There is a typo in cell 2: print(f’Parameter embedding_weight ({embed.weight.shape}, ’

should be: print(f’Parameter embedding_weight ({embed.weight.shape}, ’

Currently embed.weight.dtype is being treated as a string and is printed as it is

This is the first time I see such kind of style in PyTorch. Could you please update the code for a more readable form?: For selecting the device, just type torch.device( “cuda” if torch.cuda.is_availbale() else “cpu”); In train function, there is a closure to initialize weights in Xavier initialization form, but this is already implemented as a default in PyTorch. Then d2l.Accumulator(2) that’s a little bit confusing :wink: I like the book, but it would be better to have a more PyTorch style.

Got the error: AttributeError: Can’t pickle local object ‘load_data_ptb..PTBDataset’
for cell: lr, num_epochs = 0.002, 5
train(net, data_iter, lr, num_epochs).

Anyone has the same problem?

I don’t understand loss(pred, label, mask) * mask.shape[1] / mask.sum(axis=1), why we need * mask.shape[1] / mask.sum(axis=1)?