Dropout

Hi, AdaV when I implemented it , it somehow was the case. But I am not sure of the veracity of my claims. I guess I am the most unreliable person on this chat !XD

Since this is my first post, I was not allowed to post any embedded content. I wrote up a quick set of notes here-

I would love some guidance on question 3. How might we visualization or calculate the activation, or variance of the activation, of hidden layer units?
Thanks

I am confused, in the last line of Sec. 5.6:

By design, the expectation remains unchanged, i.e., E[h’] = h

Is it correct? or should be E[h’] = E[h]?

Exercise 6:
dropout one row of W(2) at a time is equivalent to dropout on the hidden layer.
dropout one col of W(2) at a time is equivalent to dropout on the output layer.
A total random dropout on W probably leads to worse slower converging speed.

My solutions to the exs: 5.6

Hi
I don’t understand the point of this : X.reshape((X.shape[0], -1))
It will reshape X as the same shape.

my exercise:

  1. decrease dropout: didn’t see any change of results;
    increase dropout: val_acc significantly decrease when dropout > 0.9;

  2. without dropout: see sudden decrease and increase of val_acc with epoch increasing, is this a sign of overfit? double decent?

  3. I guess the var will increase after dropout be applied.

  4. I think dropout will decrease the performance of model during test, and you have no benefit by doing this.

  5. I find adding weight decay reduced the performance of my MLP, the performance rank: MLP+WD < MLP+WD+dropout<MLP+dropout. Is this because WD impaired the express ability of MLP? Below are my code, I’m not sure if its correct:
    class WD_DropOutMLP(d2l.Classifier):
    def init(self, num_outputs, num_hiddens_1, num_hiddens_2, dropout_1, dropout_2, lr, wd):
    super().init()
    self.save_hyperparameters()
    self.wd = wd
    self.net = nn.Sequential(
    nn.Flatten(), nn.LazyLinear(num_hiddens_1), nn.ReLU(), nn.Dropout(dropout_1),
    nn.LazyLinear(num_hiddens_2), nn.ReLU(), nn.Dropout(dropout_2), nn.LazyLinear(num_outputs))

    def configure_optimizers(self):
    params = list(self.net.named_parameters())
    weight_params = [param for name, param in params if ‘weight’ in name]
    bias_params = [param for name, param in params if ‘bias’ in name]
    return torch.optim.SGD([
    {‘params’: weight_params, ‘weight_decay’: self.wd},
    {‘params’: bias_params}], lr=self.lr)

  6. TBD

  7. TBD