https://d2l.ai/chapter_multilayer-perceptrons/mlp-implementation.html
So we are estimating about 200k weights (785x256 + 256x10) with 60k data points. Is it an overfitting?
So we are estimating about 200k weights (785x256 + 256x10) with 60k data points. Is it an overfitting?