Deferred Initialization

http://d2l.ai/chapter_deep-learning-computation/deferred-init.html

Q1. specify the input dimensions to the first layer? I looked at the nn.Dense documentation but I did not find how to set both the input and output dimensions.

Q2. We get a runtime error, correct?

Q3. Don’t we use padding, if, for example, we have sentences of varying length? I guess the author here means using the same params on different inputs.

@anirudh I do not see a PyTorch implementation for this section. Is it because deferred initialization is not possible with Pytorch ? or can we use the LazyLinear for the same functionality?

Yes, @sushmit86 Deferred init is not applicable in PyTorch.

@sushmit86 @Reza_Afra

I tested and checked that input dimension is set with in_units argument from nn.Dense class constructor. Output dimension is the one from matrix product W.Xt

I’ve got an error due to the fact matrix product failed because of unmatched dimensions.

I think it is possible to build an alternative input layer, that shares parameters with nominal input layer (or part of parameters). Then when sequential network is feeded with data with alternative dimension, then alternative layer is feeded with it. Hence, result is feeded into other upper layers from sequential network.

Following python script apllies this process with mxnet.gluon.

Nevertheless, I’am not sure back propagation will applied successfully.

Blockquote

class MlpFlexibleInputLayer(gluon.nn.Block) :
    def __init__(self, dim2, **kwargs) :
        super().__init__(**kwargs)
        #----------------------------------------------------------
        # Store alternative input dimension
        #----------------------------------------------------------
        self.otherDim = dim2

        #----------------------------------------------------------
        # Build network processing nominal input data
        # It is also initialized
        #----------------------------------------------------------
        self._net = gluon.nn.Sequential()
        self._net.add(gluon.nn.Dense(10))
        self._net.add(gluon.nn.Dense(1))
        self._net.initialize()

        #----------------------------------------------------------
        # Alternative input layer with alternative input dimension
        #----------------------------------------------------------
        self._inputBlock = gluon.nn.Dense(10, in_units=self.otherDim)
        self._inputBlock.initialize()

        #----------------------------------------------------------
        # Feed alternative layer with dummy data 
        # in order to access weights and bias
        #----------------------------------------------------------
        x = np.zeros((1,self.otherDim))
        self._inputBlock(x)

    def forward(self, X) :
        if self.otherDim == X.shape[1] :
            #----------------------------------------------------------
            # Share weights from legacy input layer
            # Last weight column if not shared  
            #----------------------------------------------------------
            self._inputBlock.weight.data()[:,:self.otherDim-1]=self._net[0].weight.data()
            self._inputBlock.bias.data()[:]=self._net[0].bias.data()[:]

            #----------------------------------------------------------
            # Feed alternative layer with input with alternative dimension
            #----------------------------------------------------------
            X2 = self._inputBlock(X)

            #----------------------------------------------------------
            # Feed other layers
            #----------------------------------------------------------
            y  = self._net[1](X2)
        else : 
            #----------------------------------------------------------
            # Feed network with nominal input
            #----------------------------------------------------------
            y = self._net(X)

        return y


#----------------------------------------------------------
# Build input data with nominal dimension
#----------------------------------------------------------
nbdim=3
X = np.random.normal(0., 0.5, (10,nbdim))

#----------------------------------------------------------
# Build sequential network that support alternative dimension
#----------------------------------------------------------
mlp = MlpFlexibleInputLayer(nbdim+1)
print("Returned shape from nominal input : {}".format(mlp(X).shape))

#----------------------------------------------------------
# Build input data with alternative dimension
#----------------------------------------------------------
X1 = np.random.normal(0., 0.5, (10,nbdim+1))
print("Returned shape from alternative input : {}".format(mlp(X1).shape))

#----------------------------------------------------------
# Feed sequential network with nominal data
#----------------------------------------------------------
print("Returned shape from nominal input : {}".format(mlp(X).shape))

#----------------------------------------------------------
# Feed sequential network with alternative data 
#----------------------------------------------------------
X1 = np.random.normal(0., 2., (5,nbdim+1))
print("Returned shape from alternative input : {}".format(mlp(X1).shape))

@Reza_Afra I think you are using mxnet. The question I had was for PyTorch.