Multilayer Perceptrons

mli · May 31, 2020, 2:46am

https://d2l.ai/chapter_multilayer-perceptrons/mlp.html

Andreas_Terzis · June 20, 2020, 12:55am

Nit: There is a small typo: “diagramtically” -> “diagrammatically”

Andreas_Terzis · June 20, 2020, 7:13am

Should the equation (4.1.5) be H_1 = \sigma(X W_1 + b_1) to match the definition of the weight and input matrices defined in Section 3.4.1.3?

goldpiggy · June 21, 2020, 9:18pm

Hi @Andreas_Terzis, sharp eyes! Fixed here https://github.com/d2l-ai/d2l-en/pull/1050/files

As for your second question, sorry for the inconsistency. They both work, but the matrixes are in “Transposed” form. To be specific:

In equation (4.1.5), we have \mathbf{W}_1, in a dimension of (q , d); and $\mathbf{X}$ in a dimension of (d, n).
On the other hand, in equation of section 3.4.1.3, we have \mathbf{W}, in a dimension of (d, q); and $\mathbf{X}$ in a dimension of (n, d).

Let me know that is clear enough.

Andreas_Terzis · June 21, 2020, 10:07pm

Thanks for the quick reply and for clarifying the differences in matrix dimensions

You can consider whether you want to explicitly mention the dimensions of \mathbf{W}_1 and \mathbf{X} in 4.1.5 to avoid confusion with the previous definitions. Doing so would also help with readers that do go through the book sequentially.

Best

goldpiggy · June 22, 2020, 4:49pm

Hi @Andreas_Terzis, great feedback! We will consider your suggestions and fix it asap.

Kushagra_Chaturvedy · July 7, 2020, 5:14am

What would be the explanation for the last question? As far as I can tell, it makes little difference if we apply the activation function row-wise (which I’m guessing refers to applying the activation function to each instance of the batch one by one) or apply the function to the whole batch. Won’t both ways yield a similar result?

goldpiggy · July 7, 2020, 11:35pm

Hi @Kushagra_Chaturvedy, minibatch may not be as representative as the whole batch. As a result, parameters learned from the (small) minibatch dataset may get some weird gradients and make the model harder to converge.

Kushagra_Chaturvedy · July 9, 2020, 5:04am

Got it. But isn’t the question talking about activation functions? How will applying the activation function row-wise or batch-wise affect the learning? Also if we keep on applying the activation function row-wise for batch_size number of rows, won’t it give the same result as applying the activation function batch-wise for a single batch?

goldpiggy · July 10, 2020, 11:14pm

hey @Kushagra_Chaturvedy, from my understanding, the last question in the exercise was asking what if the minibatch size is 1?. In this case, the minibatch is too small to converge.

sahu.vaibhav · July 26, 2020, 3:59pm

How do we explain 2nd question?

goldpiggy · July 27, 2020, 5:23pm

Hi @sahu.vaibhav! Think from here~

tinkuge · August 30, 2020, 12:10pm

In 4.1.1.3,

For a one-hidden-layer MLP whose hidden layer has h hidden units, denote by H∈Rn×h the outputs of the hidden layer, which are hidden representations

What is the sentence trying to say?

goldpiggy · September 1, 2020, 12:39am

Hi @tinkuge, we define hidden representations or hidden layer in this sentence. (As a lot of deep learning concepts are referred to the same thing. )

asadalam · October 14, 2020, 10:55pm

How do you write a pRelu function from scratch which can be recorded. I wrote the following

def prelu2(x,a=0.01):
    b = np.linspace(0,0,num=x.size)
    for i in np.arange(x.size):
        if(x[i] < 0):
            b[i]=a*x[i]
        else:
            b[i]=x[i]
    return b

But it doesn’t work and gives the error that inplace operations are not permitted when recording.

goldpiggy · October 15, 2020, 4:19am

Hey @asadalam, great try! One way to learn each operator is to check the source code.

asadalam · October 18, 2020, 11:05pm

Thanks. So PReLU is defined in mxnet.gluon.nn.activations. So how does one use it?

goldpiggy · October 20, 2020, 5:27pm

Hi @asadalam, are you asking about API or the fundamental technique? If the latter, I recommend you to read the paper, it will provide rigorous math logic. If you are asking the former, first you define prelu = nn.PReLU(), then you apply this prelu to your network. Check more at the API.

Reno · February 6, 2021, 4:00am

don’t understand the last question in the exercise. how could the activate function be applied to the minibatch? suppose we have 256 samples, how is this implemented and what would be the outcome? Thanks!

MINTD_ARGAW · February 11, 2021, 9:42am

I have been watching many video tutorials and some books.As far as i saw this is the best But having understood the mathemathical part ,i have a problem of memorizing the programming part and also writing it by my self both with frameworks or from scratch. any help on that please…