http://d2l.ai/chapter_linearnetworks/linearregression.html
Q3. I am very appreciated that someone can correct me if i’m wrong. This is my answer:

Assume that y=Xw+b+ϵ,p(ϵ)=1/2e^(−ϵ)=1/2e^(−y−y^) . So , P(yx)=1/2e^(−y−y^) .
Negative loglikelihood: LL(yx)=−log(P(yx))=−log(∏p(y(i)x(i)))=∑log2+y(i)−y^(i)=∑log2+y(i)−X(i)w−b 
∇{w} LL(yx)=X.T*(Xw−y/Xw−y)=0 . From the equation, we get that: (1) : w=(X.TX)^−1X.Ty and (2) w≠X^−1y

I will update soon.
hcmut
Honestly, I don’t know what exactly the answer on each question. So, I just tried to write the code based on the questions and the topicห that I understand. And this is the code that I wrote.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
#u can change the values of x and y as u may like
#alp or learning rate also has the impact to the function, so adjust it properly
y = tf.random.normal([12], 0, 1, tf.float32)
x = np.arange(12)
*def minibatch(y,x,alp):
w = 1
b = 0
z = np.array([])
while True:
out = np.sum(x*(w*x + b  y))
out2 = np.sum(w*x + b  y)
w1 = w  (alp/len(x))*out
b1 = b  (alp/len(x))*out2
if np.abs((w1w)/w1) < 0.00001 and np.abs((b1b)/b1) < 0.00001:
return w,b,z
else:
#print(b)
z = np.append(z,b)
w = w1
b = b1*
Then we will use the out put that we get to plot on the graph by w is weight and z is the array used for storing b values.
w,b,z= minibatch(y,x, 0.01)
plt.plot(x,y, ‘bx’)
plt.plot(x, w*x+b)
plt.grid()
plt.show()
hopefully, this can help you.