Softmax Regression Implementation from Scratch

lokeshkvn · July 23, 2020, 6:01pm

Hi everyone,

I want to know why we have not used with torch.no_grad(): while calculating the loss in cross_entropy() function or while evaluating the model. As in the linear regression from scratch we have used it. Why we are not using it in this chapter?

I think we should use

train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
with torch.no_grad():
   test_acc = evaluate_accuracy(net, test_iter)
   animator.add(epoch + 1, train_metrics + (test_acc,))

in train_ch3() function.

Please let me know.
Thank you.

goldpiggy · July 24, 2020, 6:58pm

Hi @lokeshkvn, great question. We use net.eval() in evaluate_accuracy() function, which will be set to evaluation mode. 'model.eval()' vs 'with torch.no_grad()' - PyTorch Forums

def evaluate_accuracy(net, data_iter):  #@save
    """Compute the accuracy for a model on a dataset."""
    if isinstance(net, torch.nn.Module):
        net.eval()  # Set the model to evaluation mode
    metric = Accumulator(2)  # No. of correct predictions, no. of predictions
    for _, (X, y) in enumerate(data_iter):
        metric.add(accuracy(net(X), y), d2l.size(y))
    return metric[0] / metric[1]

StevenJokes · July 25, 2020, 5:13am

What does " No. of correct predictions, no. of predictions" mean?

goldpiggy · July 27, 2020, 5:03pm

Number of correct predictions, number of total predictions.

Gavin · August 25, 2020, 8:33am

Hi all, is there anyone encounter the issue – “The kernel appears to have died. It will restart automatically.” when running train_ch3? I checked it is the code below in the train_ch3 causes the issue:

animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])

Does any one know the reason for this issue? Any help would be appreciated.

StevenJokes · August 25, 2020, 8:56am

@Gavin
Have you googled it?
You can try pip uninstall numpy, then pip install -U numpy from https://www.youtube.com/watch?reload=9&v=RhpkTBvb-WU
If you have any other questions, try to solve it by googling it.
If you still have problem, then publish all your code or give me a github URL, and more informations of your environment.

Gavin · August 25, 2020, 1:43pm

@StevenJokes Thanks a lot. It solved my issue. It turns out my numpy version was 1.18.1, after I updated it to 1.19.1, the codes work perfectly.

Btw, I did google it before I asked, but couldn’t find the right answer.

StevenJokes · August 25, 2020, 2:55pm

Give the helpful reply a love. It will make forum more active.

oliver · September 14, 2020, 1:11am

In the train_epoch_ch3(), I wonder why we can call l.backward() without passing a tensor as argument since l is non-scalar, and why call l.sum() in the else block before .backward().

def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """The training loop defined in Chapter 3."""
    # Set the model to training mode
    if isinstance(net, torch.nn.Module):
        net.train()
    # Sum of training loss, sum of training accuracy, no. of examples
    metric = Accumulator(3)
    for X, y in train_iter:
        # Compute gradients and update parameters
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            updater.zero_grad()
            l.backward()
            updater.step()
            metric.add(float(l) * len(y), accuracy(y_hat, y),
                       y.size().numel())
        else:
            l.sum().backward()
            updater(X.shape[0])
            metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # Return training loss and training accuracy
    return metric[0] / metric[2], metric[1] / metric[2]

anirudh · September 14, 2020, 7:55am

@oliver PTAL at my PR here which can probably explain your doubt. I’ve added comments to the code for making it clear.

Let me know if it is still unclear.

oliver · September 14, 2020, 11:40am

Thanks for your reply but I still don’t get it, I think .backward() method has default argument torch.tensor(1) for scalar, but when it is called by non-scalar, argument is required, am I right? What’s difference between built-in modules and custom ones?

anirudh · September 22, 2020, 11:30pm

Hi @oliver, Sorry for the late reply.

The inbuilt loss criterion in PyTorch used here automatically reduces the loss to a scalar value using the argument reduction = “mean”/“sum” (default is mean). You can check this out here. For our custom loss we need to achieve the same reduction and hence we do a l.sum() before calling backward().

I hope this will clarify the doubt.

Luis_Ramirez · October 3, 2020, 10:42pm

It’s very fun to study with this material. It’s quite amazing, a lot of good stuff.
I wanted to ask:
3.6.9
Solution 3.)
How to over come the problem of overflow for the softmax probabilities. Since, we’re dealing with exponencial function, we normalize it all. I mean to take z_i = x_i - mu(x_i) / std(x_i) and plug it into the exponential function so we can compute exp(x_i) without overflow.

goldpiggy · October 4, 2020, 11:35pm

Hi @Luis_Ramirez, great question. We allude the “logsumtrick” in Section 3.7.

Yue_Ying · October 29, 2020, 8:26am

Dear authors, I don’t understand the use of enumerate in the loop of function evaluate_accuracy
Why don’t use for X, y in data_iter: instead?

goldpiggy · November 2, 2020, 9:55pm

Hi @Yue_Ying, great catch! Would you like to post a PR to correct it and be a contributor?

bergamo_bobson · November 11, 2020, 2:49am

@goldpiggy in accuracy function why don’t we simply use the mean instead of using the sum and divide by the length later?
i mean use tf.math.reduce_mean(cmp.type(y.dtype))

goldpiggy · November 16, 2020, 11:00pm

Great question. Here we just want to align the TF implementation with the others.

Reno · February 5, 2021, 4:22am

Hi,

def train_epoch_ch3(net, train_iter, loss, updater): #@save
“”“The training loop defined in Chapter 3.”""
# Set the model to training mode
if isinstance(net, torch.nn.Module):
net.train()
# Sum of training loss, sum of training accuracy, no. of examples
metric = Accumulator(3)
for X, y in train_iter:
# Compute gradients and update parameters
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
# Using PyTorch in-built optimizer & loss criterion
updater.zero_grad()
l.backward()
updater.step()
metric.add(float(l) * len(y), accuracy(y_hat, y),
y.size().numel())
else:
# Using custom built optimizer & loss criterion
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
# Return training loss and training accuracy
return metric[0] / metric[2], metric[1] / metric[2]

In the above code, it looks good to use custom updater. But it will raise error if we use the inbuilt optimizer since the loss function should be using the inbuilt loss function as well. Please update the block for better clarification.

SinclairWang · February 9, 2021, 1:25pm

what if ‘isinstance(net, torch.nn.Module)’ return False? The eval mode willn’t be set.