Pretraining word2vec

astonzhang · October 26, 2020, 5:41pm

https://d2l.ai/chapter_natural-language-processing-pretraining/word2vec-pretraining.html

BH_L · May 20, 2021, 1:58am

What exactly nn.embbeding layer did? it seems just a linear layer. then the skip_gram() function looks different with the paper showed flow. I did not understand this part. I saw some skip_gram implementation and this one is exactly same with the paper showed flow: https://www.kaggle.com/karthur10/skip-gram-implementation-with-pytorch-step-by-step

BH_L · May 20, 2021, 2:31am

The nn.embedding is a kind of lookup table, it will return the weights, then the implementation is same as the paper.

vinven7 · July 8, 2021, 10:48pm

There is a typo in cell 2: print(f’Parameter embedding_weight ({embed.weight.shape}, ’
‘dtype={embed.weight.dtype})’)

should be: print(f’Parameter embedding_weight ({embed.weight.shape}, ’
f’dtype={embed.weight.dtype})’)

Currently embed.weight.dtype is being treated as a string and is printed as it is

SuleymanSuleymanzade · December 14, 2022, 5:08pm

This is the first time I see such kind of style in PyTorch. Could you please update the code for a more readable form?: For selecting the device, just type torch.device( “cuda” if torch.cuda.is_availbale() else “cpu”); In train function, there is a closure to initialize weights in Xavier initialization form, but this is already implemented as a default in PyTorch. Then d2l.Accumulator(2) that’s a little bit confusing I like the book, but it would be better to have a more PyTorch style.

wolfwang · March 2, 2023, 5:43pm

Got the error: AttributeError: Can’t pickle local object ‘load_data_ptb..PTBDataset’
for cell: lr, num_epochs = 0.002, 5
train(net, data_iter, lr, num_epochs).

Anyone has the same problem?

YookoTian · March 3, 2023, 4:20pm

I don’t understand loss(pred, label, mask) * mask.shape[1] / mask.sum(axis=1), why we need * mask.shape[1] / mask.sum(axis=1)?

jasonchen0755 · April 18, 2023, 5:57am

n the function ‘load_data_ptb’ , set ‘num_workers=0’, then re-generate data_iter will help.

jasonchen0755 · April 18, 2023, 5:58am

this is kind of mask affected column mean

pandalabme · October 19, 2023, 3:56am

My solutions to the exs: 15.4

JH.Lam · September 10, 2024, 10:32am

some questions:
a. how to metric ‘semantically similarity’
b. why uses cosine to do similarity rather than ‘dot product’ ? the latter is used in training step, and both has same intersection , but with different set