Softmax Regression Implementation from Scratch

goldpiggy · October 4, 2020, 11:35pm

Hi @Luis_Ramirez, great question. We allude the “logsumtrick” in Section 3.7.

Yue_Ying · October 29, 2020, 8:26am

Dear authors, I don’t understand the use of enumerate in the loop of function evaluate_accuracy
Why don’t use for X, y in data_iter: instead?

goldpiggy · November 2, 2020, 9:55pm

Hi @Yue_Ying, great catch! Would you like to post a PR to correct it and be a contributor?

bergamo_bobson · November 11, 2020, 2:49am

@goldpiggy in accuracy function why don’t we simply use the mean instead of using the sum and divide by the length later?
i mean use tf.math.reduce_mean(cmp.type(y.dtype))

goldpiggy · November 16, 2020, 11:00pm

Great question. Here we just want to align the TF implementation with the others.

Reno · February 5, 2021, 4:22am

Hi,

def train_epoch_ch3(net, train_iter, loss, updater): #@save
“”“The training loop defined in Chapter 3.”""
# Set the model to training mode
if isinstance(net, torch.nn.Module):
net.train()
# Sum of training loss, sum of training accuracy, no. of examples
metric = Accumulator(3)
for X, y in train_iter:
# Compute gradients and update parameters
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
# Using PyTorch in-built optimizer & loss criterion
updater.zero_grad()
l.backward()
updater.step()
metric.add(float(l) * len(y), accuracy(y_hat, y),
y.size().numel())
else:
# Using custom built optimizer & loss criterion
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
# Return training loss and training accuracy
return metric[0] / metric[2], metric[1] / metric[2]

In the above code, it looks good to use custom updater. But it will raise error if we use the inbuilt optimizer since the loss function should be using the inbuilt loss function as well. Please update the block for better clarification.

SinclairWang · February 9, 2021, 1:25pm

what if ‘isinstance(net, torch.nn.Module)’ return False? The eval mode willn’t be set.

AbL · February 26, 2021, 7:25am

Hi, in the PyTorch implementation of train_epoch_ch3, is there any reason why we are using y.size().numel() instead of simply y.numel()?

if isinstance(updater, torch.optim.Optimizer):
    ...
    metric.add(float(l) * len(y), accuracy(y_hat, y),
                   y.size().numel())

anirudh · March 5, 2021, 2:21am

Thanks for raising this. It is probably a small bug while porting old code. Now fixed here

Bann_Comehere · March 8, 2021, 4:01pm

y hat means the output(

non-linear activation function(bias+linear combination of inputs)
)
@StevenJokess

HomunculusK · July 16, 2021, 8:06pm

if isinstance(net, torch.nn.Module):
net is a python function this line will return False

fanbyprinciple · July 17, 2021, 12:44am

my answers discussion

Exercises

In this section, we directly implemented the softmax function based on the mathematical
definition of the softmax operation. What problems might this cause? Hint: try to calculate
the size of exp(50).

the number is too big

The function cross_entropy in this section was implemented according to the definition of
the cross-entropy loss function. What could be the problem with this implementation? Hint:
consider the domain of the logarithm.

domain of logarithm is all non negative numbers.

What solutions you can think of to fix the two problems above?

normalise the data first

Is it always a good idea to return the most likely label? For example, would you do this for
medical diagnosis?

no in medical dignosis we need high confidence in our predictions, so we need most likely label above a confidence threshold (probability)

Assume that we want to use softmax regression to predict the next word based on some
features. What are some problems that might arise from a large vocabulary?

if the vocabulary is large then the one hotencoding of the array would be big, and sparse too, we cannot directly apply softmax over all possible y values which will be equal to length of the vocabulary.

Ye_Zhang · January 15, 2022, 2:43am

y_hat[0] return the first tensor[0.1,0.3,0.6], and the y_hat[1] returns the second. We want the possibility of the true label of each tensor 0.1 and 0.5 returned, which is [0, 2] therefore the y. So y_hat[[0,1], [0,2]] returns the first posibility of the first tensor and the third posibility of the second tensor

Debanjan_Das · January 15, 2022, 4:23am

@anirudh Why do we always check for the instance type (torch.nn.Module) of net? For the training and evaluation method we can just directly use net.train() and net.eval() respectively. Am I missing something here? Thanks!

anirudh · April 10, 2022, 3:08pm

Hi @Debanjan_Das,
We also have a few models which are built from scratch and those models do not have the train or eval attribute since they are not subclassing nn.Module. This is just a way to reuse the saved functions making them compatible with scratch and concise versions of PyTorch code.

azimjon · June 6, 2022, 12:58pm

How to change the range of the y axis?

I tried to change ylim in the train_ch3 function, but it didn’t work.

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
    """Train a model (defined in Chapter 3).

    Defined in :numref:`sec_softmax_scratch`"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.0, 1.0],
                        legend=['train loss', 'train acc', 'test acc'])

GabrielC · August 13, 2022, 10:00pm

Is it a typo?
“The correct labels are 1 and 2 respectively”

The correct labels should be 0 and 2, not 1 and 2.

USTBweishu · August 14, 2022, 3:24am

hello, in the first place, thanks for your reading. I have compared the English vision of this textbooks with Chinese vision, but i find that there is much difference between the two vision’s code and what i want to verify is that if we have changed the code in English vision cause the code is always use class for encapsulation

Sim_OCRDL · December 28, 2022, 4:31am

Hi. I have a question for the loss function in 4.4.3. In the code example, it was averaged using the function .mean() during the implementation of the cross_entropy() function. Should it be .sum() instead? What is the consequence between using the mean versus summing up?

pandalabme · August 18, 2023, 7:37am

My solutions to the exs: 4.4