Softmax回归的简洁实现

谢谢老哥(zsbdzsbdzsbdzsbd

所以书里说的“#@save是特殊标记,会把语句保存在对d2l包中”是骗人的啊,还在奇怪这咋实现的

我在本地conda环境下运行notebook, 但是在最后一步训练的时候,ipynb的kernel似乎崩溃了,显示如下信息:

Kernel Restarting
The kernel for softmax-regression-concise.ipynb appears to have died. It will restart automatically.

请问运行这个notebook对硬件的要求是什么。我手上的笔记本有点老了。i7 6代处理器,24G内存,4G英伟达显卡,是否跑得动,谢谢

我也遇到这个问题,是要降低d2l的版本吗

image

在前一章节明明已经保存了train_ch3函数了,为什么在这一章节里还是module ‘d2l.torch’ has no attribute 'train_ch3’呢?我又尝试了重新保存再重新导入d2l包仍然不行。好奇怪。

d2l版本的问题,降低一下版本试试,我之前默认是1.0.3,也提示相关错误,现在换成0.17.0好了

d2l=1.0.3环境下运行代码 d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer) 报错 module 'd2l.torch' has no attribute 'train_ch3'

似乎是新版本的包中缺少了 train_ch3 的相关实现,解决办法是在 d2l 的包中添加相关代码,也就是 3.6中的 evaluate_accurary(), train_epoch_ch3() 和 train_ch3(),具体步骤如下:

  1. 定位到环境下d2l包的位置,我使用conda虚拟环境,名称为d2l,python版本3.11,具体是在 /home/cutelemon6/miniconda3/envs/d2l/lib/python3.11/site-packages/d2l
  2. 删除原来的缓存 rm -r /home/cutelemon6/miniconda3/envs/d2l/lib/python3.11/site-packages/d2l/__pycache__
  3. 如果你使用pytorch,在 torch.py 中添加如下代码
def evaluate_accurary(net, data_iter: torch.utils.data.DataLoader):
    if isinstance(net, torch.nn.Module):
        net.eval()
    metric = Accumulator(2)
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

def train_epoch_ch3(net, train_iter, loss, updater):
    metrics = Accumulator(3)
    if isinstance(net, torch.nn.Module):
        net.train()
    for X, y in train_iter:
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            l.sum().backward()
            updater(X.shape[0]) # number of X's samples
        metrics.add(float(l.sum()), accuracy(y_hat, y), y.numel())
        
    return metrics[0] / metrics[2], metrics[1] / metrics[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater): 
    """训练模型(定义见第3章)"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for i in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accurary(net, test_iter)
        animator.add(i + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc
  1. 重启jupyter notebook kernel,全部运行代码即可

Hi, the reason why the trainloss is not displayed on the image is because the value of the loss is too large or the value of the loss is too small. You can check the code that calculates the loss to see if the loss added in is too large or too small.

:one:d2l的train_ch3包找不到的话,可以在前面加这些代码:

def evaluate_accuracy(net, data_iter):
    net.eval()
    metric = Accumulator(2)
    with torch.no_grad():
        for x,y in data_iter:
            metric.add(accuracy(net(x),y), y.numel())
    return metric[0] / metric[1]

def train_epoch_ch3(net, train_iter, loss, updater): 
    net.train()
    metric = Accumulator(3)
    for X, y in train_iter:
        y_hat = net(X)
        l = loss(y_hat, y)
        updater.zero_grad()
        l.mean().backward()
        updater.step()
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    return metric[0] / metric[2], metric[1] / metric[2]

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

:two:如果你的train loss显示不出来,你可以在上述的train_ch3中加入一段代码输出loss,看是否过大或过小。

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
        print(f'epoch:{epoch+1} and train loss:{train_metrics[0]}') #这里输出loss
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

我的trainloss曲线没有显示,并且最后总会抛出异常。
会出现以下的断言失败,原因是loss太大了。


调整学习率0.1、0.05、0.01都不行,请问各位有好的解决办法吗?

解决了。原因是loss不能用求和,这样会超过坐标轴的表示范围
修改为:loss = nn.CrossEntropyLoss(reduction=‘mean’)
使用均值就能成功在坐标轴上显示了