So here right after I call dense.initialize() the weights are initialized. This is in contrast with deferred initialization. It makes sense because we specified the in_unit.
Is MyDense() as efficient as nn.Dense()?
I’m afraid not. At least because mxnet.gluon.nn.Dense
is hybridized (see 12.1.2) and its backend is implemented in C/C++.