Being a newbie, this might be a stupid question. Can somebody tell why there is a difference in loss between Tensorflow & Pytorch? The initial TensorFlow loss seems almost twice than that of pytorch. It seems that this is true for a lot of the CNN architectures. Or are the network parameters initialized differently? (IMO, the default initializations of tensorflow layers are same as how we initialize the network in pytorch with d2l.init_cnn function, so that can’t be the difference). So what am I missing? Is there a difference in how loss is calculated?
My solutions to the exs: 8.8