http://d2l.ai/chapter_deep-learning-computation/parameters.html

Chap 5.2.2.1

in the first and second pytorch code, for function

**init_normal(m)**

I guess it should be

**nn.init.XXX(m.weight, *args)**

since

pytorch.nn.Module.apply(fn)Applies `fn` recursively to every submodule (as returned by`.children()`

) as well as self.

it doesn’t make sense to repeatedly initialize **net[0]** and it’s aiming to initialize all parameters

Hi @kwang that’s a great catch. If we are to apply initialization to all the `Linear layers`

in the network, then we should replace `net[0]`

to `m`

inside the `init_normal`

function.

Ps. While I was at it, I have fixed the naming of the functions too. The second function should be named `init_constant`

.

This is now fixed in master. You can soon see the changes in the next update to release branch.

Thanks!

For tied parameters (link), why is the gradient the sum of the gradients of the two layers? I was thinking it would be the product of the gradients of the two layers. Reasoning:

y = f(f(x))

dy/dx = f’(f(x))*f’(x) where x is a vector denoting the shared parameters.

Maybe the grad of the shared layer is not reset to 0 after the first time we meet it, so the grad of the shared layer would be computed twice and the final grad would be the sum?

For tied parameters 5.2.3 `print(net[2].weight.data[0] == net[4].weight.data[0])`

, I am guessing it should have been `print(net[2].weight.data[0] == net[3].weight.data[0])`

, given parameters are shared among layer 2 and layer 3.