Concise Implementation of Multilayer Perceptron

Alvin · November 29, 2020, 3:50am

Hi @Nish, do you solve this problem? I meet the same problem.

But I think this is normal phenomena as the learning rate increases. Since if the learning rate is too large, the gradient will skip the minimum-point. Such that, the curve will go wild.

samyfr7 · March 29, 2021, 9:02am

if there is a drop in plot…is that a error…if so y?
if my loss is 0 at the end of the training does that mean its “over fitting” or is it a “good model”?

samyfr7 · March 29, 2021, 9:15am

there is no train loss plot…what is the error

this happened to me after running the code or the training loop for four times…what does this mean

StevenJokess · March 31, 2021, 10:40am

@samyfr7
I guess it was caused by that the train loss is too small to show the figure ,which I guess it from high train accuracy.
Oh…You can get my point from my third train:

But %%why was your figure so different from mine?**

Maybe the different hypemeters?

min_xu · April 12, 2021, 8:19am

it looks like the grad_clipping can slove this problem, and beacuse the learning rate is bigger than normal, the learning process is faster than normal.

the changing code is

fanbyprinciple · July 21, 2021, 2:07am

Exercises

Try adding different numbers of hidden layers (you may also modify the learning rate). What

setting works best?

it worked for me by just adding one more layer , and with SGD best earning rate was 0.1

Try out different activation functions. Which one works best?

I tried ADAM but shd was the best activation.

Try different schemes for initializing the weights. What method works best?

tried putting all linear layers to zero, but normal initialisation works best

fanbyprinciple · July 21, 2021, 2:09am

looks like lr is too large

Akshay_Pansari · September 29, 2021, 2:02pm

net = nn.Sequential(nn.Flatten(), nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10))
def init_weights(m):
    if type(m) == nn.Linear:
         nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);

My doubt here is: We are initializing the weights but not passing any parameter to the function.
Morever, I could not find what is the use of net.apply

Akshay_Pansari · September 29, 2021, 3:05pm

I found the answer here

Philip_C · April 30, 2022, 4:49am

Can someone help here? I am not sure why I encountered this error here after I ran the final training loop.