https://d2l.ai/chapter_linear-classification/softmax-regression-scratch.html

In 3.6.4, `y_hat[range(len(y_hat)), y]`

What did `y_hat[y]`

mean?

```
print(y_hat[y])
```

IndexError Traceback (most recent call last)

in

----> 1 print(y_hat[y])

IndexError: index 2 is out of bounds for dimension 0 with size 2

And I found other styles:

```
def cross_entropy(y_hat,y):
return -torch.log(y_hat.gather(1,y.view(-1,1)))
```

What differences?

In `class Accumulator:`

:

`self.data = [a+float(b) for a, b in zip(self.data, args)]`

What is the meaning of `a+float(b)`

?

It couldnât be better,

if you can combine these to **explain what happened behind**

`metric.add(float(l)*len(y), float(accuracy(y_hat, y)), len(y))`

and

`metric.add(l_sum, accuracy(y_hat, y), y.numpy().size)`

## 3.6.9

- Nothing happened!? And, max number of 64float is
*2^1024 - 2^(1023-52)*.

So**e^1024 will overflow**ďź

```
X = torch.tensor([[50., 51., 52.], [54., 55., 56.]])
X_prob = softmax(X)
X_prob
```

tensor([[0.0900, 0.2447, 0.6652],

[0.0900, 0.2447, 0.6652]])

- log(0) will error!
- Use RELU to replace softmax?
- In medical diagnosis, we may more need to find all possible result to avoid condition worsening.
- A large vocabulary will make every wordâs probabilty near to 0.

In the train_epoch_ch3 function, in the line `metric.add(float(l)*len(y), float(accuracy(y_hat, y)), len(y))`

I donât understand the reason why we need to multiply the loss l with the length of the label tensor. Since we are accumulating the loss wouldnât it be fine if donât multiply it?

Hi @Kushagra_Chaturvedy the reason for multiplying with len(y) is that when using torchâs built-in loss function i.e nn.CrossEntropyLoss, it reduces the loss to mean by default. See the default parameter value for reduction=âmeanâ. We in our case want to have the sum. Hence multiplying by len(y) gives us the sum.

This is actually used in concise softmax implementation. you can check that chapter.

Thanks for the reply @anirudh. A couple more things, why are we accumulating the sum of the loss? Wouldnât it make more sense to find the mean loss from the loss tensor and then accumulate that instead of accumulating the sum of the values in the loss tensor? Also, if I defined the updater as an instance of `torch.optim.SGD`

, then pt_optimizer would return True right? And in that case how will the calculated loss âlâ be a scalar (since in the pt_optimizer=True condition, we calculate l.backward() instead of l.sum().backward() which would imply that l is a scalar )

Hi everyone,

I want to know why we have not used `with torch.no_grad():`

while calculating the loss in `cross_entropy()`

function or while evaluating the model. As in the linear regression from scratch we have used it. Why we are not using it in this chapter?

I think we should use

`train_metrics = train_epoch_ch3(net, train_iter, loss, updater) with torch.no_grad(): test_acc = evaluate_accuracy(net, test_iter) animator.add(epoch + 1, train_metrics + (test_acc,))`

in `train_ch3()`

function.

Please let me know.

Thank you.

Hi @lokeshkvn, great question. We use ` net.eval()`

in `evaluate_accuracy()`

function, which will be set to evaluation mode. 'model.eval()' vs 'with torch.no_grad()' - PyTorch Forums

```
def evaluate_accuracy(net, data_iter): #@save
"""Compute the accuracy for a model on a dataset."""
if isinstance(net, torch.nn.Module):
net.eval() # Set the model to evaluation mode
metric = Accumulator(2) # No. of correct predictions, no. of predictions
for _, (X, y) in enumerate(data_iter):
metric.add(accuracy(net(X), y), d2l.size(y))
return metric[0] / metric[1]
```

What does " No. of correct predictions, no. of predictions" mean?

Number of correct predictions, number of total predictions.

Hi all, is there anyone encounter the issue â âThe kernel appears to have died. It will restart automatically.â when running **train_ch3**? I checked it is the code below in the **train_ch3** causes the issue:

`animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9], legend=['train loss', 'train acc', 'test acc'])`

Does any one know the reason for this issue? Any help would be appreciated.

@Gavin

Have you googled it?

You can try `pip uninstall numpy`

, then `pip install -U numpy`

from https://www.youtube.com/watch?reload=9&v=RhpkTBvb-WU

If you have any other questions, try to solve it by googling it.

If you still have problem, then publish all your code or give me a github URL, and more informations of your environment.

@StevenJokes Thanks a lot. It solved my issue. It turns out my * numpy* version was

*, after I updated it to*

**1.18.1***, the codes work perfectly.*

**1.19.1**Btw, I did google it before I asked, but couldnât find the right answer.

Give the helpful reply a love. It will make forum more active.

In the `train_epoch_ch3()`

, I wonder why we can call `l.backward()`

without passing a tensor as argument since `l`

is non-scalar, and why call `l.sum()`

in the else block before `.backward()`

.

```
def train_epoch_ch3(net, train_iter, loss, updater): #@save
"""The training loop defined in Chapter 3."""
# Set the model to training mode
if isinstance(net, torch.nn.Module):
net.train()
# Sum of training loss, sum of training accuracy, no. of examples
metric = Accumulator(3)
for X, y in train_iter:
# Compute gradients and update parameters
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
updater.zero_grad()
l.backward()
updater.step()
metric.add(float(l) * len(y), accuracy(y_hat, y),
y.size().numel())
else:
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
# Return training loss and training accuracy
return metric[0] / metric[2], metric[1] / metric[2]
```

@oliver PTAL at my PR here which can probably explain your doubt. Iâve added comments to the code for making it clear.

Let me know if it is still unclear.

Thanks for your reply but I still donât get it, I think `.backward()`

method has default argument `torch.tensor(1)`

for scalar, but when it is called by non-scalar, argument is required, am I right? Whatâs difference between built-in modules and custom ones?

Hi @oliver, Sorry for the late reply.

The inbuilt loss criterion in PyTorch used here automatically reduces the loss to a scalar value using the argument `reduction `

= âmeanâ/âsumâ (default is mean). You can check this out here. For our custom loss we need to achieve the same reduction and hence we do a `l.sum()`

before calling `backward()`

.

I hope this will clarify the doubt.

Itâs very fun to study with this material. Itâs quite amazing, a lot of good stuff.

I wanted to ask:

3.6.9

Solution 3.)

How to over come the problem of overflow for the softmax probabilities. Since, weâre dealing with exponencial function, we normalize it all. I mean to take z_i = x_i - mu(x_i) / std(x_i) and plug it into the exponential function so we can compute exp(x_i) without overflow.