线性回归

https://zh.d2l.ai/chapter_linear-networks/linear-regression.html

1 Like

中文版第二版纸质书啥时候出版呢? :smiley:

没有答案 :frowning: 题目有点难哈哈哈

这样对不对呢

  • 问题1:已知 𝑥1,…,𝑥𝑛∈ℝ,目标找到𝑏,使得最小化 ∑(𝑥𝑖−𝑏)^2
  • 设 𝑦𝑖= 𝑥𝑖 - 𝑏,问题转化为Y = ∑𝑦𝑖^2,展开
  • Y = ∑(𝑥𝑖^2 - 2𝑥𝑖·𝑏 + 𝑏^2) = ∑𝑥𝑖^2 - 2∑𝑥𝑖·𝑏 + n·𝑏^2
  • 把 ∑𝑥𝑖^2 和 2∑𝑥𝑖视为常数,则转化为关于b的一元二次方程
  • 由于n>0,有最小值,计算地𝑏解析解为:
    1/n·∑𝑥𝑖 + 1/n·√((∑𝑥𝑖)^2 - n∑𝑥𝑖^2)) 或
    1/n·∑𝑥𝑖 - 1/n·√((∑𝑥𝑖)^2 - n∑𝑥𝑖^2))
  • 前提:√((∑𝑥𝑖)^2 - n∑𝑥𝑖^2)) ≥0
    是不是应该用矩阵或向量的形式表述?
    从含义上理解, 𝑥𝑖是直线上的一些点,使∑(𝑥𝑖−𝑏)^2最小,可以理解为到每个点的距离和最小,结果应该是所有点的均值所在的点

同问:假设我们有⼀些数据x1, . . . , xn ∈ R。我们的⽬标是找到⼀个常数b,使得最小化∑
i
(xi − b)
2。
这个应该怎么使用线性回归的方式去处理的,房屋价格问题我倒是一知半解的,程序员的逻辑思维还不能好好转变过来

请问From d2l import torch as d2l
最后报错SyntaxError: invalid syntax该怎么处理

F大写了?就这一句还能有语法错误呢,。。。。。。

###########################
image
image

梯度下降公式里,是每个样本的loss对参数求导再求和。
但在程序中,是多个样本的loss之和再对参数求导 ???

第一题直接展开求导,得最小值,高中数学。

对于第一问,我的解法是设X的均值为 x* ;
∑(xi-b)^2 = ∑(xi - x* + t)^2 = ∑(xi - x*)^2 + 2t(xi - x*) + t^2
=∑(xi - x*)^2 + nt^2 + 0 (由于∑xi = nx,∑2t(xi - x*)值为0)
因此,当t = 0时,∑(xi-b)^2取得最小值为 D(X)。
故b = x*
至于与正态分布的关系,大概说的应该是中心极限定律。

3.1.6 练习题1

  1. 平方的求和是一个恒定大于等于0的式子,所以当曲线变化率接近0的时候我认为是一个最小值点,故求导并让导数值 = 0时,b等于x的均值。
  2. 说实话不是很明白,我想会不会是如果记x轴为x_i - b的差值区间, y轴为差值属于这个区间的元素数量,那么满足最优解的时候这张图我认为也应该要呈现出一种正太分布的样子

微分操作是线性的,所以先微分还是先求和没有区别

逻辑回归才是极大似然估计啊,这本书提到线性回归也是,是不是错了?

先执行 !pip install d2l 就可以了

#!/usr/bin/env python3

import numpy as np
import torch

class Linear:
    def __init__(self, n, eta=0.01):
        self.theta = torch.normal(0.0, 1.0, size=(n+1, 1), requires_grad=True)
        self.eta = eta

    def cal(self, x):
        return torch.concat((x, torch.ones(x.shape[0], 1)), dim=1) @ self.theta

    def loss(self, x, y):
        return (y - self.cal(x)) ** 2

    def batch(self, x, y):
        loss = self.loss(x, y).sum()
        loss.backward()
        with torch.no_grad():
            self.theta -= self.eta * self.theta.grad / x.shape[0]
            print(f'loss={loss};theta={self.theta.reshape(3)}')


def synthetic_data(w, b, num_examples):
    """生成y=Xw+b+噪声"""
    X = torch.normal(0.0, 1.0, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 10)
mod = Linear(2)
for i in range(100):
    x, y = synthetic_data(true_w, true_b, 10)
    mod.batch(x, y)

我尝试理解之后自己写了一个线性层,但是不同的随机数据集训练出的结果差别非常大,有没有哪位大哥帮我看看是不是有什么问题?

结果如下:

loss=531.988525390625;theta=tensor([ 0.6741,  1.5314, -1.5466], requires_grad=True)
loss=519.7105102539062;theta=tensor([ 0.6976,  1.3687, -1.3290], requires_grad=True)
loss=860.7129516601562;theta=tensor([ 0.8107,  1.0294, -0.9735], requires_grad=True)
loss=505.11395263671875;theta=tensor([ 0.9274,  0.5874, -0.5116], requires_grad=True)
loss=494.73187255859375;theta=tensor([ 1.0005, -0.0031,  0.0446], requires_grad=True)
loss=119.57245635986328;theta=tensor([ 1.0954, -0.5914,  0.6550], requires_grad=True)
loss=430.96051025390625;theta=tensor([ 1.2534, -1.3290,  1.3740], requires_grad=True)
loss=123.87307739257812;theta=tensor([ 1.4306, -2.1071,  2.1459], requires_grad=True)
loss=51.164485931396484;theta=tensor([ 1.6061, -2.9124,  2.9511], requires_grad=True)
loss=15.387931823730469;theta=tensor([ 1.7819, -3.7230,  3.7787], requires_grad=True)
loss=3.40395188331604;theta=tensor([ 1.9626, -4.5278,  4.6152], requires_grad=True)
loss=9.35221004486084;theta=tensor([ 2.1452, -5.3155,  5.4533], requires_grad=True)
loss=33.502891540527344;theta=tensor([ 2.3234, -6.0670,  6.2937], requires_grad=True)
loss=81.21678924560547;theta=tensor([ 2.4986, -6.7785,  7.1082], requires_grad=True)
loss=233.32803344726562;theta=tensor([ 2.6764, -7.3922,  7.8752], requires_grad=True)
loss=233.37728881835938;theta=tensor([ 2.8734, -7.9556,  8.5663], requires_grad=True)
loss=369.426513671875;theta=tensor([ 3.0623, -8.3969,  9.2173], requires_grad=True)
loss=420.9701232910156;theta=tensor([ 3.2856, -8.7715,  9.7596], requires_grad=True)
loss=601.8832397460938;theta=tensor([ 3.5031, -9.0670, 10.1633], requires_grad=True)
loss=554.967529296875;theta=tensor([ 3.7319, -9.2916, 10.4454], requires_grad=True)
loss=1030.767333984375;theta=tensor([ 3.9408, -9.3487, 10.5607], requires_grad=True)
loss=229.73336791992188;theta=tensor([ 4.1317, -9.3879, 10.6261], requires_grad=True)
loss=1085.172119140625;theta=tensor([ 4.2593, -9.2684, 10.5228], requires_grad=True)
loss=386.8089599609375;theta=tensor([ 4.3602, -9.1315, 10.3227], requires_grad=True)
loss=811.8597412109375;theta=tensor([ 4.3966, -8.8539, 10.0138], requires_grad=True)
loss=1223.5562744140625;theta=tensor([ 4.3963, -8.3375,  9.5231], requires_grad=True)
loss=678.25732421875;theta=tensor([ 4.3390, -7.7299,  8.8878], requires_grad=True)
loss=620.6932983398438;theta=tensor([ 4.2797, -6.9790,  8.1211], requires_grad=True)
loss=182.00277709960938;theta=tensor([ 4.1823, -6.1867,  7.3215], requires_grad=True)
loss=226.32501220703125;theta=tensor([ 4.0558, -5.3382,  6.4473], requires_grad=True)
loss=134.9973602294922;theta=tensor([ 3.8888, -4.4538,  5.5208], requires_grad=True)
loss=68.11566162109375;theta=tensor([ 3.6709, -3.5516,  4.5785], requires_grad=True)
loss=38.36671447753906;theta=tensor([ 3.4085, -2.6362,  3.6344], requires_grad=True)
loss=12.295902252197266;theta=tensor([ 3.1286, -1.7215,  2.6891], requires_grad=True)
loss=123.59443664550781;theta=tensor([ 2.8058, -0.8948,  1.7775], requires_grad=True)
loss=144.82211303710938;theta=tensor([ 2.4594, -0.1241,  0.9196], requires_grad=True)
loss=197.0822296142578;theta=tensor([2.1257, 0.6030, 0.1401], requires_grad=True)
loss=454.7916259765625;theta=tensor([ 1.7994,  1.2222, -0.5215], requires_grad=True)
loss=672.1536254882812;theta=tensor([ 1.4390,  1.6701, -1.0648], requires_grad=True)
loss=402.4933776855469;theta=tensor([ 1.1109,  2.0062, -1.5661], requires_grad=True)
loss=784.4359741210938;theta=tensor([ 0.8341,  2.2291, -1.9096], requires_grad=True)
loss=326.7474060058594;theta=tensor([ 0.5518,  2.4344, -2.1612], requires_grad=True)
loss=656.53662109375;theta=tensor([ 0.2400,  2.4741, -2.3516], requires_grad=True)
loss=1191.728271484375;theta=tensor([-0.1010,  2.3200, -2.3443], requires_grad=True)
loss=474.1963195800781;theta=tensor([-0.4540,  2.1167, -2.2311], requires_grad=True)
loss=1050.075927734375;theta=tensor([-0.7888,  1.7115, -1.9715], requires_grad=True)
loss=882.9622192382812;theta=tensor([-1.0361,  1.1817, -1.5684], requires_grad=True)
loss=524.0263061523438;theta=tensor([-1.2242,  0.5748, -1.0761], requires_grad=True)
loss=347.0560302734375;theta=tensor([-1.3641, -0.0977, -0.5314], requires_grad=True)
loss=330.15216064453125;theta=tensor([-1.5000, -0.8499,  0.0946], requires_grad=True)
loss=227.52728271484375;theta=tensor([-1.6143, -1.6489,  0.7838], requires_grad=True)
loss=235.7517547607422;theta=tensor([-1.6353, -2.4838,  1.4942], requires_grad=True)
loss=200.72230529785156;theta=tensor([-1.5985, -3.3616,  2.2604], requires_grad=True)
loss=227.18133544921875;theta=tensor([-1.4603, -4.2472,  3.0725], requires_grad=True)
loss=145.7969512939453;theta=tensor([-1.2475, -5.1074,  3.8953], requires_grad=True)
loss=103.00408172607422;theta=tensor([-0.9966, -5.9170,  4.7048], requires_grad=True)
loss=245.4693603515625;theta=tensor([-0.6626, -6.6339,  5.4978], requires_grad=True)
loss=265.03265380859375;theta=tensor([-0.2661, -7.2382,  6.2913], requires_grad=True)
loss=233.37060546875;theta=tensor([ 0.1829, -7.7771,  7.0385], requires_grad=True)
loss=315.7805480957031;theta=tensor([ 0.7009, -8.2558,  7.7002], requires_grad=True)
loss=645.592529296875;theta=tensor([ 1.2346, -8.5653,  8.2334], requires_grad=True)
loss=145.95962524414062;theta=tensor([ 1.7392, -8.8296,  8.7464], requires_grad=True)
loss=331.44921875;theta=tensor([ 2.2040, -9.0297,  9.1879], requires_grad=True)
loss=942.5620727539062;theta=tensor([ 2.5506, -8.9755,  9.5437], requires_grad=True)
loss=467.2809753417969;theta=tensor([ 2.9046, -8.8704,  9.7770], requires_grad=True)
loss=478.8691101074219;theta=tensor([ 3.3476, -8.6963,  9.8919], requires_grad=True)
loss=850.0372924804688;theta=tensor([ 3.7364, -8.3141,  9.9146], requires_grad=True)
loss=608.9434814453125;theta=tensor([ 4.0938, -7.8360,  9.8162], requires_grad=True)
loss=647.3972778320312;theta=tensor([ 4.3668, -7.2417,  9.6103], requires_grad=True)
loss=211.79762268066406;theta=tensor([ 4.6553, -6.6351,  9.3280], requires_grad=True)
loss=167.3760223388672;theta=tensor([ 4.9267, -6.0448,  8.9791], requires_grad=True)
loss=264.1585388183594;theta=tensor([ 5.1460, -5.4350,  8.5622], requires_grad=True)
loss=265.04119873046875;theta=tensor([ 5.3491, -4.7818,  8.0559], requires_grad=True)
loss=227.40081787109375;theta=tensor([ 5.5051, -4.0909,  7.4859], requires_grad=True)
loss=292.4118957519531;theta=tensor([ 5.5879, -3.3750,  6.8214], requires_grad=True)
loss=226.05003356933594;theta=tensor([ 5.5892, -2.7248,  6.0967], requires_grad=True)
loss=249.56619262695312;theta=tensor([ 5.4731, -2.0723,  5.3298], requires_grad=True)
loss=152.64756774902344;theta=tensor([ 5.2790, -1.4308,  4.5453], requires_grad=True)
loss=285.6107177734375;theta=tensor([ 4.9527, -0.8647,  3.7928], requires_grad=True)
loss=122.79955291748047;theta=tensor([ 4.5621, -0.3228,  3.0252], requires_grad=True)
loss=121.6588363647461;theta=tensor([4.1250, 0.1900, 2.2874], requires_grad=True)
loss=275.0060729980469;theta=tensor([3.6451, 0.6170, 1.6285], requires_grad=True)
loss=233.96275329589844;theta=tensor([3.1393, 0.9647, 1.0111], requires_grad=True)
loss=386.85406494140625;theta=tensor([2.5418, 1.2107, 0.4645], requires_grad=True)
loss=543.2579956054688;theta=tensor([1.8463, 1.3148, 0.0193], requires_grad=True)
loss=225.1168975830078;theta=tensor([ 1.1082,  1.3510, -0.3929], requires_grad=True)
loss=632.675048828125;theta=tensor([ 0.3948,  1.2323, -0.6947], requires_grad=True)
loss=517.4296264648438;theta=tensor([-0.2903,  0.9903, -0.9111], requires_grad=True)
loss=506.6915283203125;theta=tensor([-0.8938,  0.6999, -1.0073], requires_grad=True)
loss=733.8193969726562;theta=tensor([-1.4347,  0.2982, -0.9444], requires_grad=True)
loss=292.29241943359375;theta=tensor([-1.9209, -0.0920, -0.7960], requires_grad=True)
loss=283.09063720703125;theta=tensor([-2.3847, -0.5569, -0.6015], requires_grad=True)
loss=567.23388671875;theta=tensor([-2.8089, -1.1203, -0.2652], requires_grad=True)
loss=229.3523406982422;theta=tensor([-3.1847, -1.6750,  0.1263], requires_grad=True)
loss=452.8999328613281;theta=tensor([-3.4409, -2.3028,  0.5568], requires_grad=True)
loss=262.04522705078125;theta=tensor([-3.6271, -2.8785,  1.0422], requires_grad=True)
loss=304.97216796875;theta=tensor([-3.7350, -3.4740,  1.5778], requires_grad=True)
loss=386.376220703125;theta=tensor([-3.7450, -4.0822,  2.1946], requires_grad=True)
loss=290.7869567871094;theta=tensor([-3.6587, -4.6811,  2.8225], requires_grad=True)
loss=281.5928039550781;theta=tensor([-3.4840, -5.2947,  3.5097], requires_grad=True)

我是这样算的

3.1.11处这样描述,因此,我们现在可以写出通过给定的x观测到特定y的似然 (likelihood):
不应该是,我们根据给定的x和y 观测到噪声参数w和b取值的最大可信度吗?