Hi, AdaV when I implemented it , it somehow was the case. But I am not sure of the veracity of my claims. I guess I am the most unreliable person on this chat !XD
Since this is my first post, I was not allowed to post any embedded content. I wrote up a quick set of notes here-
I would love some guidance on question 3. How might we visualization or calculate the activation, or variance of the activation, of hidden layer units?
Thanks
I am confused, in the last line of Sec. 5.6:
By design, the expectation remains unchanged, i.e., E[hâ] = h
Is it correct? or should be E[hâ] = E[h]?
Exercise 6:
dropout one row of W(2) at a time is equivalent to dropout on the hidden layer.
dropout one col of W(2) at a time is equivalent to dropout on the output layer.
A total random dropout on W probably leads to worse slower converging speed.
Hi
I donât understand the point of this : X.reshape((X.shape[0], -1))
It will reshape X as the same shape.
my exercise:
-
decrease dropout: didnât see any change of results;
increase dropout: val_acc significantly decrease when dropout > 0.9; -
without dropout: see sudden decrease and increase of val_acc with epoch increasing, is this a sign of overfit? double decent?
-
I guess the var will increase after dropout be applied.
-
I think dropout will decrease the performance of model during test, and you have no benefit by doing this.
-
I find adding weight decay reduced the performance of my MLP, the performance rank: MLP+WD < MLP+WD+dropout<MLP+dropout. Is this because WD impaired the express ability of MLP? Below are my code, Iâm not sure if its correct:
class WD_DropOutMLP(d2l.Classifier):
def init(self, num_outputs, num_hiddens_1, num_hiddens_2, dropout_1, dropout_2, lr, wd):
super().init()
self.save_hyperparameters()
self.wd = wd
self.net = nn.Sequential(
nn.Flatten(), nn.LazyLinear(num_hiddens_1), nn.ReLU(), nn.Dropout(dropout_1),
nn.LazyLinear(num_hiddens_2), nn.ReLU(), nn.Dropout(dropout_2), nn.LazyLinear(num_outputs))def configure_optimizers(self):
params = list(self.net.named_parameters())
weight_params = [param for name, param in params if âweightâ in name]
bias_params = [param for name, param in params if âbiasâ in name]
return torch.optim.SGD([
{âparamsâ: weight_params, âweight_decayâ: self.wd},
{âparamsâ: bias_params}], lr=self.lr) -
TBD
-
TBD