Generalization

StevenJokes · June 28, 2020, 7:01am

The functions to your methods( Am I right?):

Corresponding with the first method is DataLoader(Drop_Last = True)?
How to fill?
DataLoader(Drop_Last = False)

the necessity of float() in def add ,
self.data = [a+float(b) for a, b in zip(self.data, args)]?

Why metric.add(l.sum(), y.numpy().size) or metric.add(l*len(y), y.numpy().size) ?
x.nelement()in pytorch is like x.size in numpy.
https://pytorch.org/docs/master/generated/torch.nn.MSELoss.html
return vector(batch_size, ) .

Would you like to tell me the meanings of l.sum () or l*len(y) in more details?

goldpiggy · June 29, 2020, 4:10pm

Hi @StevenJokes,

You can choose last_batch in {‘keep’ , ‘discard’ , ‘rollover’} ) in DataLoader (https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.data.DataLoader.html).

StevenJokes · June 29, 2020, 4:35pm

I’m learning pytorch.https://pytorch.org/docs/stable/data.html
Am I right?DataLoader(drop_last = False)
https://beta.mxnet.io/guide/getting-started/to-mxnet/pytorch.html is helpful!

goldpiggy · June 30, 2020, 4:22pm

Hi @StevenJokes Yes!

Kushagra_Chaturvedy · July 15, 2020, 7:15am

@goldpiggy I don’t understand how underfitting is happening when we limit our polynomial degree to 3. I understand that the labels that we generated were made using poynomial of degree 4 and we are trying to train the 3 degree polynomial to get labels that are close to those generated by 4 degree polynomial and hence our model will be inaccurate. But how does underfitting relate to this? What characteristic of underfitting is shown by this model?

Kushagra_Chaturvedy · July 15, 2020, 7:30am

@goldpiggy
Also, in section 4.4.4.4,

    # Pick the first four dimensions, i.e., 1, x from the polynomial features
train(poly_features[:n_train, 0:3], poly_features[n_train:, 0:3],
      labels[:n_train], labels[n_train:])

Since we are trying to fit a linear model to demonstrate underfitting as per the heading, shouldn’t we be picking the first two dimensions (not the first four dimensions as stated in the comment in the code) and choose train_features and test_features to be poly_features[ :n_train, 0:2] and poly_features[ n_train:, 0:2]?

anirudh · July 15, 2020, 1:50pm

@Kushagra_Chaturvedy Please take a look at the updated code in the master branch of the repo.

StevenJokes · July 15, 2020, 2:09pm

I think what @Kushagra_Chaturvedy said is right. https://github.com/d2l-ai/d2l-en/pull/1181

manuel-arno-korfmann · July 23, 2020, 4:04pm

small typo in the exercises:

question:

“Concider” -> “Consider”

goldpiggy · July 23, 2020, 5:00pm

Great call! Feel free to fix it over a PR and be our contributor! https://d2l.ai/chapter_appendix-tools-for-deep-learning/contributing.html

Gkkkkkkkkk · September 17, 2020, 3:25am

I can’t display the animation in VS Code. Anyone knows how to solve it?

alaa-shubbak · November 10, 2020, 4:43pm

Hello, is the correct answer to question 2 , part 3 …the plot of training and generalization loss against the amount of data ( losses versus data ) should be similar to the ones versus the modle complexity (losses versus the model complexity)

sushmit86 · November 22, 2020, 12:13am

In section 4.4.4.5 Higher-Order Polynomial Function Fitting (Overfitting) the test error is quite similar to the training error. We hardly see any gap between test vs training error. Ideally, we should have seen a higher test error rate and a gap between training and test error. Let me know if my inferences are correct?

Reno · February 8, 2021, 5:50pm

question1: w_i = y/(\sum x^i)
question 2: after 2 degrees, both reach nearly zero error
def train_2(train_features, test_features, train_labels, test_labels,
num_epochs=400):
loss = nn.MSELoss()
input_shape = train_features.shape[-1]
# Switch off the bias since we already catered for it in the polynomial
# features
net = nn.Sequential(nn.Linear(input_shape, 1, bias=False))
batch_size = min(10, train_labels.shape[0])
train_iter = d2l.load_array((train_features, train_labels.reshape(-1,1)),
batch_size)
test_iter = d2l.load_array((test_features, test_labels.reshape(-1,1)),
batch_size, is_train=False)
trainer = torch.optim.SGD(net.parameters(), lr=0.01)

for epoch in range(num_epochs):
    d2l.train_epoch_ch3(net, train_iter, loss, trainer)

return (evaluate_loss(net, train_iter, loss),
        evaluate_loss(net, test_iter, loss))

train_losses = []
test_losses = []
for i in range(1, len(poly_features[0]) + 1):
train_loss, test_loss = train_2(poly_features[:n_train, :i], poly_features[n_train:, :i], labels[:n_train], labels[n_train:])
train_losses.append(train_loss)
test_losses.append(test_loss)

bascially it is telling us the complexity will reduce the error and go to overfitting once beyond a point

samyfr7 · March 30, 2021, 10:57am

poly_features = np.power(features, np.arange(max_degree).reshape(1, -1))

why did we use reshape here

StevenJokess · March 31, 2021, 10:26am

@samyfr7

Maybe we can dropout “reshape”.

washiloo · April 11, 2021, 8:54pm

Hi! In Section 4.4.3, under “Model complexity”, it reads “In fact, whenever the data examples each have a distinct value of x, a polynomial function with degree equal to the number of data examples can fit the training set perfectly.”. Although this is true, it could be a little misleading to the unaware reader, who might, for example, think that a second-degree polynomial is needed to perfectly fit two data points, whereas a linear function would suffice. Therefore, it could be more general to say that “[…] a polynomial function with degree d >= n - 1 can fit the training set perfectly, where n is the number of data examples.”.

Great book! =)

PQwarrior · April 30, 2021, 10:14am

Hello, when I try to solve the third question:
3. What happens if you drop the normalization (1/i!1/i!) of the polynomial features xixi? Can you fix this in some other way?
I found that I cannot get the answer when I increase the degree of the model to 6 or greater because of the explosion of gradient. I tried to fix it by decrease the learning ratio(1e-4) and increase the training epochs(1000) but got only little improvements (3.3 training error and 9.7 test error).
I wonder whether it is the right way to fix the problem or we have some other better solutions.

fanbyprinciple · July 24, 2021, 1:43am

Exercises and my attempt

Can you solve the polynomial regression problem exactly? Hint: use linear algebra.

dont know how to do it. exactly.

Consider model selection for polynomials:
1. Plot the training loss vs. model complexity (degree of the polynomial). What do you
observe? What degree of polynomial do you need to reduce the training loss to 0?
at about 5 degree it becomes zero.
1. Plot the test loss in this case.
2. Generate the same plot as a function of the amount of data.

plots for three:

Screenshot (466)547×800 47.6 KB

What happens if you drop the normalization (1/i!) of the polynomial features x
? Can you fix this in some other way?

the test loss and train loss become nan after 7th feature.
Maybe we can put our own normalisation feature mechanism, though I am not sure.

Can you ever expect to see zero generalization error?

only when the environment conditions are fully known.

fanbyprinciple · July 24, 2021, 1:53am

this is a nice attempt