Weight Decay

http://d2l.ai/chapter_multilayer-perceptrons/weight-decay.html

Accidentaly put 0.1 learning rate, and always got nan value for L2 norm of w in scratch implementation. I want to know where would the computation part fail but could not find to get the answer. Anyone know? Thank you.

code please.
@rezahabibi96

Please consider a more general definition for weight decay in this chapter -
you can refer to the findings of - ‘DECOUPLED WEIGHT DECAY REGULARIZATION - Ilya Loshchilov & Frank Hutter’