Dropout

Hi, AdaV when I implemented it , it somehow was the case. But I am not sure of the veracity of my claims. I guess I am the most unreliable person on this chat !XD

Since this is my first post, I was not allowed to post any embedded content. I wrote up a quick set of notes here-

I would love some guidance on question 3. How might we visualization or calculate the activation, or variance of the activation, of hidden layer units?
Thanks

I am confused, in the last line of Sec. 5.6:

By design, the expectation remains unchanged, i.e., E[h’] = h

Is it correct? or should be E[h’] = E[h]?

Exercise 6:
dropout one row of W(2) at a time is equivalent to dropout on the hidden layer.
dropout one col of W(2) at a time is equivalent to dropout on the output layer.
A total random dropout on W probably leads to worse slower converging speed.

My solutions to the exs: 5.6