Being a newbie, this might be a stupid question. Can somebody tell why there is a difference in loss between Tensorflow & Pytorch? The initial TensorFlow loss seems almost twice than that of pytorch. It seems that this is true for a lot of the CNN architectures. Or are the network parameters initialized differently? (IMO, the default initializations of tensorflow layers are same as how we initialize the network in pytorch with d2l.init_cnn function, so that can’t be the difference). So what am I missing? Is there a difference in how loss is calculated?
Denote by F(e) the cumulative distribution function (CDF) for errors committed by networks of a given design space.
How to evaluate quantificationally the term ‘error’?
It looks like it is the percentage of misclassified samples, though neither this book nor the article that introduced the strategy (Designing Network Design Spaces) say explicitly what they mean by error.
This is my study notes (in Chinese).
This chapter reminds me of electronics system designs - large design spaces, huge amount of publications on design case studies, but eventually brilliant researchers find physical/mathematical models. Abstractions are created and guiding principles are established.
This doesn’t seem to be the case for deep learning (yet). I feel like folks are probably wasting huge amount of resources running sub-optimal trainings/explorations, where the elegant/fundamental but difficult work is not approached enough.