Could someone please explain me what has happened in the
BPRLoss.forward() function to the L^2 penalty that we have in (16.5.2)?
As far as I can see if
forward only the
- np.sum(np.log(npx.sigmoid(distances)), 0, keepdims=True) is contributing to the loss, without L^2 norm of the parameters of the distribution (model)
The L^2 norm can be replaced with weight decay during the training, so there is no need to add L^2 norm here again.
You can refer to 16.3. Matrix Factorization
Training and Evaluating the Model
"In the training function, we adopt the 𝐿2 loss with weight decay. The weight decay mechanism has the same effect as the 𝐿2 regularization. "