Hi, AdaV when I implemented it , it somehow was the case. But I am not sure of the veracity of my claims. I guess I am the most unreliable person on this chat !XD
Since this is my first post, I was not allowed to post any embedded content. I wrote up a quick set of notes here-
I would love some guidance on question 3. How might we visualization or calculate the activation, or variance of the activation, of hidden layer units?
Thanks
I am confused, in the last line of Sec. 5.6:
By design, the expectation remains unchanged, i.e., E[h’] = h
Is it correct? or should be E[h’] = E[h]?
Exercise 6:
dropout one row of W(2) at a time is equivalent to dropout on the hidden layer.
dropout one col of W(2) at a time is equivalent to dropout on the output layer.
A total random dropout on W probably leads to worse slower converging speed.
Hi
I don’t understand the point of this : X.reshape((X.shape[0], -1))
It will reshape X as the same shape.