Concise Implementation of Multilayer Perceptron

Hi @Nish, do you solve this problem? I meet the same problem.


But I think this is normal phenomena as the learning rate increases. Since if the learning rate is too large, the gradient will skip the minimum-point. Such that, the curve will go wild.

1 Like
  1. if there is a drop in plot…is that a error…if so y?

  2. if my loss is 0 at the end of the training does that mean its “over fitting” or is it a “good model”?

there is no train loss plot…what is the error

this happened to me after running the code or the training loop for four times…what does this mean

@samyfr7
I guess it was caused by that the train loss is too small to show the figure ,which I guess it from high train accuracy.
Oh…You can get my point from my third train:


But %%why was your figure so different from mine?**

Maybe the different hypemeters?

it looks like the grad_clipping can slove this problem, and beacuse the learning rate is bigger than normal, the learning process is faster than normal.

the changing code is


image

Exercises

  1. Try adding different numbers of hidden layers (you may also modify the learning rate). What

setting works best?

  • it worked for me by just adding one more layer , and with SGD best earning rate was 0.1
  1. Try out different activation functions. Which one works best?
  • I tried ADAM but shd was the best activation.
  1. Try different schemes for initializing the weights. What method works best?
  • tried putting all linear layers to zero, but normal initialisation works best

looks like lr is too large

net = nn.Sequential(nn.Flatten(), nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10))
def init_weights(m):
    if type(m) == nn.Linear:
         nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);

My doubt here is: We are initializing the weights but not passing any parameter to the function.
Morever, I could not find what is the use of net.apply

I found the answer here

Can someone help here? I am not sure why I encountered this error here after I ran the final training loop.