Some questions on Linear Regression Exercises

I have some questions regarding the exercises from the Linear Regression Chapter here.

Q3. I don’t think I understand this statement “How would you formulate this in a deep network?”

Q4. 3) How can I determine the expected value of the design matrix X’X in this case ?
4) I don’t understand this question.

Q5. 2) from my calculations, -log P(y|X) = |eps| + log2. Then is the optimal value for X, the median of all Xs ?
3) In my understanding, the the gradient of the log probability will fluctuate from -1 to +1 near zero. But how can I mitigate its effect ?

Q7. 1) I don’t understand why the gaussian noise assumption is not appropriate other than the fact that gaussian deals with continuous values and poisson deals with discrete values.

Any help would be greatly appreciated.