Generalization

Kushagra_Chaturvedy · July 15, 2020, 7:30am

@goldpiggy
Also, in section 4.4.4.4,

    # Pick the first four dimensions, i.e., 1, x from the polynomial features
train(poly_features[:n_train, 0:3], poly_features[n_train:, 0:3],
      labels[:n_train], labels[n_train:])

Since we are trying to fit a linear model to demonstrate underfitting as per the heading, shouldn’t we be picking the first two dimensions (not the first four dimensions as stated in the comment in the code) and choose train_features and test_features to be poly_features[ :n_train, 0:2] and poly_features[ n_train:, 0:2]?

anirudh · July 15, 2020, 1:50pm

@Kushagra_Chaturvedy Please take a look at the updated code in the master branch of the repo.

StevenJokes · July 15, 2020, 2:09pm

I think what @Kushagra_Chaturvedy said is right. https://github.com/d2l-ai/d2l-en/pull/1181

manuel-arno-korfmann · July 23, 2020, 4:04pm

small typo in the exercises:

question:

“Concider” -> “Consider”

goldpiggy · July 23, 2020, 5:00pm

Great call! Feel free to fix it over a PR and be our contributor! https://d2l.ai/chapter_appendix-tools-for-deep-learning/contributing.html

Gkkkkkkkkk · September 17, 2020, 3:25am

I can’t display the animation in VS Code. Anyone knows how to solve it?

alaa-shubbak · November 10, 2020, 4:43pm

Hello, is the correct answer to question 2 , part 3 …the plot of training and generalization loss against the amount of data ( losses versus data ) should be similar to the ones versus the modle complexity (losses versus the model complexity)

sushmit86 · November 22, 2020, 12:13am

In section 4.4.4.5 Higher-Order Polynomial Function Fitting (Overfitting) the test error is quite similar to the training error. We hardly see any gap between test vs training error. Ideally, we should have seen a higher test error rate and a gap between training and test error. Let me know if my inferences are correct?

Reno · February 8, 2021, 5:50pm

question1: w_i = y/(\sum x^i)
question 2: after 2 degrees, both reach nearly zero error
def train_2(train_features, test_features, train_labels, test_labels,
num_epochs=400):
loss = nn.MSELoss()
input_shape = train_features.shape[-1]
# Switch off the bias since we already catered for it in the polynomial
# features
net = nn.Sequential(nn.Linear(input_shape, 1, bias=False))
batch_size = min(10, train_labels.shape[0])
train_iter = d2l.load_array((train_features, train_labels.reshape(-1,1)),
batch_size)
test_iter = d2l.load_array((test_features, test_labels.reshape(-1,1)),
batch_size, is_train=False)
trainer = torch.optim.SGD(net.parameters(), lr=0.01)

for epoch in range(num_epochs):
    d2l.train_epoch_ch3(net, train_iter, loss, trainer)

return (evaluate_loss(net, train_iter, loss),
        evaluate_loss(net, test_iter, loss))

train_losses = []
test_losses = []
for i in range(1, len(poly_features[0]) + 1):
train_loss, test_loss = train_2(poly_features[:n_train, :i], poly_features[n_train:, :i], labels[:n_train], labels[n_train:])
train_losses.append(train_loss)
test_losses.append(test_loss)

bascially it is telling us the complexity will reduce the error and go to overfitting once beyond a point

samyfr7 · March 30, 2021, 10:57am

poly_features = np.power(features, np.arange(max_degree).reshape(1, -1))

why did we use reshape here

StevenJokess · March 31, 2021, 10:26am

@samyfr7

Maybe we can dropout “reshape”.

washiloo · April 11, 2021, 8:54pm

Hi! In Section 4.4.3, under “Model complexity”, it reads “In fact, whenever the data examples each have a distinct value of x, a polynomial function with degree equal to the number of data examples can fit the training set perfectly.”. Although this is true, it could be a little misleading to the unaware reader, who might, for example, think that a second-degree polynomial is needed to perfectly fit two data points, whereas a linear function would suffice. Therefore, it could be more general to say that “[…] a polynomial function with degree d >= n - 1 can fit the training set perfectly, where n is the number of data examples.”.

Great book! =)

PQwarrior · April 30, 2021, 10:14am

Hello, when I try to solve the third question:
3. What happens if you drop the normalization (1/i!1/i!) of the polynomial features xixi? Can you fix this in some other way?
I found that I cannot get the answer when I increase the degree of the model to 6 or greater because of the explosion of gradient. I tried to fix it by decrease the learning ratio(1e-4) and increase the training epochs(1000) but got only little improvements (3.3 training error and 9.7 test error).
I wonder whether it is the right way to fix the problem or we have some other better solutions.

fanbyprinciple · July 24, 2021, 1:43am

Exercises and my attempt

Can you solve the polynomial regression problem exactly? Hint: use linear algebra.

dont know how to do it. exactly.

Consider model selection for polynomials:
1. Plot the training loss vs. model complexity (degree of the polynomial). What do you
observe? What degree of polynomial do you need to reduce the training loss to 0?
at about 5 degree it becomes zero.
1. Plot the test loss in this case.
2. Generate the same plot as a function of the amount of data.

plots for three:

Screenshot (466)547×800 47.6 KB

What happens if you drop the normalization (1/i!) of the polynomial features x
? Can you fix this in some other way?

the test loss and train loss become nan after 7th feature.
Maybe we can put our own normalisation feature mechanism, though I am not sure.

Can you ever expect to see zero generalization error?

only when the environment conditions are fully known.

fanbyprinciple · July 24, 2021, 1:53am

this is a nice attempt

gongjiaji · August 6, 2021, 6:47am

why this plot overfitting? I don’t understand, the training loss and testing loss decreases constantly to 0.01, and there’s no bounce-back of testing loss at the optimal point. To me, this plot shows a very successful training outcome.

aaronshi2017 · December 17, 2021, 12:18pm

I got following error when running the code, am I the only one? I am running using VS

PS C:\Users\T929189\PycharmProjects> & C:/Users/T929189/AppData/Local/Programs/Python/Python39/python.exe “c:/Users/T929189/PycharmProjects/Deep Learning/4.4_MPL_v1.1.py”
Traceback (most recent call last):
File “c:\Users\T929189\PycharmProjects\Deep Learning\4.4_MPL_v1.1.py”, line 56, in
train(poly_features[:n_train, :4], poly_features[n_train:, :4],
File “c:\Users\T929189\PycharmProjects\Deep Learning\4.4_MPL_v1.1.py”, line 48, in train
d2l.train_epoch_ch3(net, train_iter, loss, trainer)
File “C:\Users\T929189\AppData\Local\Programs\Python\Python39\lib\site-packages\d2l\torch.py”, line 271, in train_epoch_ch3
l.backward()
File “C:\Users\T929189\AppData\Roaming\Python\Python39\site-packages\torch\tensor.py”, line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “C:\Users\T929189\AppData\Roaming\Python\Python39\site-packages\torch\autograd_init_.py”, line 141, in backward
grad_tensors_ = make_grads(tensors, grad_tensors)
File “C:\Users\T929189\AppData\Roaming\Python\Python39\site-packages\torch\autograd_init_.py”, line 50, in _make_grads
raise RuntimeError(“grad can be implicitly created only for scalar outputs”)
RuntimeError: grad can be implicitly created only for scalar outputs

aaronshi2017 · December 17, 2021, 12:23pm

It works by changing to loss=nn.MSELoss()

aaronshi2017 · December 17, 2021, 4:22pm

Add d2l.plt.show() to the end of train function, you will be able to see the generated chart.

Chao_Wang · January 28, 2022, 10:56am

May I know why the following line is needed?
np.random.shuffle(features)

It seems that
features = np.random.normal(size=(n_train + n_test, 1))
is already a random sequence, so shuffle is unnecessary.

Thanks!