Implementation of Multilayer Perceptrons

mli · May 31, 2020, 2:46am

https://d2l.ai/chapter_multilayer-perceptrons/mlp-implementation.html

Abhinav_Tripathi · August 28, 2020, 9:31am

I am getting lower train_acc/test_acc and higher train_loss (results attached as image) MLP_scratch_loss .

Is there something wrong with my code ? My code is as follows:

from d2l import mxnet as d2l
from mxnet import gluon, np, npx, autograd
npx.set_np()
batch_size = 256
test_iter, train_iter = d2l.load_data_fashion_mnist(batch_size)

num_inputs, num_hidden, num_outputs = 784, 256, 10

w1 = np.random.normal(scale= 0.01,size = (num_inputs, num_hidden))
b1 = np.zeros(num_hidden)
w2 = np.random.normal(scale= 0.01, size = (num_hidden, num_outputs))
b2 = np.zeros(num_outputs)

params = [w1, b1, w2, b2]

for param in params:
param.attach_grad()

def relu(X):
return np.maximum(X,0)

def net(X):
X = X.reshape((-1, num_inputs))
H = relu(np.dot(X,w1) + b1)
return np.dot(H,w2)+b2

loss = gluon.loss.SoftmaxCrossEntropyLoss()

num_epochs, lr = 10, 0.1

d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs,
lambda batch_size: d2l.sgd(params, lr, batch_size) )

rezahabibi96 · September 3, 2020, 6:00am

In here, we do not apply softmax function to the output layer, hence there may be some value which is negative and the sum of them not 1 (which should be to follow probabilty axiom). So how to decide the predicted class? the one with the maximam value? Am I missing something here about the explanation in why softmax is not applied? Thank you.

kusur · September 3, 2020, 6:56am

The reason that we don’t apply softmax in the implementation is that Cross Entropy Loss
takes care of the transformation. This is done to avoid any potential numeric overflow issues. If you look at the implementation of cross entropy loss in your preferred deep learning framework, you will find that there are several versions which take raw scores as inputs as it helps with the problems that are mentioned in http://d2l.ai/chapter_linear-networks/softmax-regression-concise.html. Hope this helps.