# Softmax Regression from Scratch

In 3.6.4, `y_hat[range(len(y_hat)), y]`
What did `y_hat[y]` mean?

``````print(y_hat[y])
``````

IndexError Traceback (most recent call last)
in
----> 1 print(y_hat[y])

IndexError: index 2 is out of bounds for dimension 0 with size 2

And I found other styles:

``````def cross_entropy(y_hat,y):
return -torch.log(y_hat.gather(1,y.view(-1,1)))
``````

What differences?

In `class Accumulator:`:
`self.data = [a+float(b) for a, b in zip(self.data, args)]`
What is the meaning of `a+float(b)`?
It couldnât be better,
if you can combine these to explain what happened behind
`metric.add(float(l)*len(y), float(accuracy(y_hat, y)), len(y))`
and
`metric.add(l_sum, accuracy(y_hat, y), y.numpy().size)`

## 3.6.9

1. Nothing happened!? And, max number of 64float is 2^1024 - 2^(1023-52).
So e^1024 will overflowďź
``````X = torch.tensor([[50., 51., 52.], [54., 55., 56.]])
X_prob = softmax(X)
X_prob
``````

tensor([[0.0900, 0.2447, 0.6652],
[0.0900, 0.2447, 0.6652]])

1. log(0) will error!
2. Use RELU to replace softmax?
3. In medical diagnosis, we may more need to find all possible result to avoid condition worsening.
4. A large vocabulary will make every wordâs probabilty near to 0.

In the train_epoch_ch3 function, in the line `metric.add(float(l)*len(y), float(accuracy(y_hat, y)), len(y))`

I donât understand the reason why we need to multiply the loss l with the length of the label tensor. Since we are accumulating the loss wouldnât it be fine if donât multiply it?

Hi @Kushagra_Chaturvedy the reason for multiplying with len(y) is that when using torchâs built-in loss function i.e nn.CrossEntropyLoss, it reduces the loss to mean by default. See the default parameter value for reduction=âmeanâ. We in our case want to have the sum. Hence multiplying by len(y) gives us the sum.
This is actually used in concise softmax implementation. you can check that chapter.

2 Likes

Thanks for the reply @anirudh. A couple more things, why are we accumulating the sum of the loss? Wouldnât it make more sense to find the mean loss from the loss tensor and then accumulate that instead of accumulating the sum of the values in the loss tensor? Also, if I defined the updater as an instance of `torch.optim.SGD`, then pt_optimizer would return True right? And in that case how will the calculated loss âlâ be a scalar (since in the pt_optimizer=True condition, we calculate l.backward() instead of l.sum().backward() which would imply that l is a scalar )

Hi everyone,

I want to know why we have not used `with torch.no_grad():` while calculating the loss in `cross_entropy()` function or while evaluating the model. As in the linear regression from scratch we have used it. Why we are not using it in this chapter?

I think we should use

``````train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
test_acc = evaluate_accuracy(net, test_iter)
animator.add(epoch + 1, train_metrics + (test_acc,))
``````

in `train_ch3()` function.

Thank you.

Hi @lokeshkvn, great question. We use ` net.eval()` in `evaluate_accuracy()` function, which will be set to evaluation mode. https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615

``````def evaluate_accuracy(net, data_iter):  #@save
"""Compute the accuracy for a model on a dataset."""
if isinstance(net, torch.nn.Module):
net.eval()  # Set the model to evaluation mode
metric = Accumulator(2)  # No. of correct predictions, no. of predictions
for _, (X, y) in enumerate(data_iter):
return metric[0] / metric[1]``````
2 Likes

What does " No. of correct predictions, no. of predictions" mean?

Number of correct predictions, number of total predictions.

1 Like

Hi all, is there anyone encounter the issue â âThe kernel appears to have died. It will restart automatically.â when running train_ch3? I checked it is the code below in the train_ch3 causes the issue:

``````animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
legend=['train loss', 'train acc', 'test acc'])
``````

Does any one know the reason for this issue? Any help would be appreciated.

@Gavin
You can try `pip uninstall numpy`, then `pip install -U numpy` from https://www.youtube.com/watch?reload=9&v=RhpkTBvb-WU
If you have any other questions, try to solve it by googling it.

1 Like

@StevenJokes Thanks a lot. It solved my issue. It turns out my numpy version was 1.18.1, after I updated it to 1.19.1, the codes work perfectly.

1 Like

Give the helpful reply a love. It will make forum more active.

1 Like

In the `train_epoch_ch3()`, I wonder why we can call `l.backward()` without passing a tensor as argument since `l` is non-scalar, and why call `l.sum()` in the else block before `.backward()`.

``````def train_epoch_ch3(net, train_iter, loss, updater):  #@save
"""The training loop defined in Chapter 3."""
# Set the model to training mode
if isinstance(net, torch.nn.Module):
net.train()
# Sum of training loss, sum of training accuracy, no. of examples
metric = Accumulator(3)
for X, y in train_iter:
# Compute gradients and update parameters
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
l.backward()
updater.step()
y.size().numel())
else:
l.sum().backward()
updater(X.shape[0])
# Return training loss and training accuracy
return metric[0] / metric[2], metric[1] / metric[2]
``````

@oliver PTAL at my PR here which can probably explain your doubt. Iâve added comments to the code for making it clear.

Let me know if it is still unclear.

Thanks for your reply but I still donât get it, I think `.backward()` method has default argument `torch.tensor(1)` for scalar, but when it is called by non-scalar, argument is required, am I right? Whatâs difference between built-in modules and custom ones?

Hi @oliver, Sorry for the late reply.

The inbuilt loss criterion in PyTorch used here automatically reduces the loss to a scalar value using the argument `reduction `= âmeanâ/âsumâ (default is mean). You can check this out here. For our custom loss we need to achieve the same reduction and hence we do a `l.sum()` before calling `backward()`.

I hope this will clarify the doubt.

1 Like

Itâs very fun to study with this material. Itâs quite amazing, a lot of good stuff.