Geometry and Linear Algebraic Operations

The classifier used in the text seems rather unnatural. A more natural way is to flatten the images, normalize them, then take the dot product as discussed in the text as a measure of similarity. The predicted label is the label of the average image that is more similar to the test image, hence the argmax.

# normalize matrices using broadcasting
W = torch.stack([ave_0.flatten().t(), ave_1.flatten().t()], dim=1)
W = W / torch.norm(W, dim=0).reshape(1, -1)
X_test = X_test.reshape(-1, 784)
X_test = X_test / torch.norm(X_test, dim=1).reshape(-1, 1)

# predict and evaluate
y_pred = torch.argmax(X_test @ W, dim=1)
print((y_test == y_pred).type(torch.float).mean())

This obtains an accuracy of ~0.95. :grinning:


What’s the interpretation of A^4 in exercise 7?

I’ve typed up solutions to the exercises in this chapter here (see bottom of the notebook). I’m still seeking guidance on exercise 7.

Any help will be greatly appreciated!

Hey, maybe a little late here; but as far as I understand it, A^4 means a matrix with 4 dimensions.

Based on this section, I think A^4 means A * A * A * A, that is, matrix A multiplied 4 times by itself. It’s the power operator (A to the power of 4) for matrices, if I’m not mistaken.

In Section 18.1.3. Hyperplanes, it says - “The set of all points where this is true is a line at right angles to the vector w”. Which condition is “this” referring to here? Is it referring to all vectors (or rather, points here) whose projection is equal to 1/||w||?

# Accuracy\n torch.mean(predictions.type(y_test.dtype) == y_test, dtype=torch.float64)

Hi, I think the way to calculate the accuracy is not right here, since the predictions.type(y_test.dtype)==y_test will return a tensor which only contain the Boolean values. It can not be calculated by the torch.mean() function. So I think the right format should be:

torch.mean((predictions.type(y_test.dtype) == y_test).float(), dtype=torch.float64)

In this way the result is exactly the same.

Hi @Tate_Zenith,

Yes, the current implementation will raise a runtime error with the latest version of torch==1.10.2. This behaviour was fine if you try to run with torch==1.8.1.

I’ll send a fix to support the latest version. We’re working on making the master branch support the latest versions of all the frameworks.

1 Like

That’s sounds great, thanks!

Section 18.1.6:

… C has rank two since, for instance, the first two columns are linearly independent, however any of the four collections of three columns are dependent.


  • Which four collections of three columns? There are 5C3=10 collections of 3 columns each that can be formed from the 5 columns. So, which 4 out of the 10 are being discussed here?
  • Perhaps, it can be made clearer by giving the proof of linear dependence of one of the collections.

For exercise7, this is what I think the solution is:
PS. this solution concern A^4 as a matrix multiplication and not element wise multiplication.

D = A^4 = A^2.A^2 = B^2
tr(D) = {sum, i} d_ii
dii = {sum, j} b_ij*b_ji
bij = {sum, k} a_ik*a_kj
dii = {sum, j}({sum, k}a_ik*a_kj)*({sum, l}a_jl*a_li)
dii = {sum, j}{sum, k}{sum, l}a_ik*a_kj*a_jl*a_li
tr(D) = {sum, i}{sum, j}{sum, k}{sum, l}a_ik*a_kj*a_jl*a_li 
tr(A^4) = torch.einsum('ik,kj,jl,li', A, A, A, A)

I verified this answer with random matrices and it worked

size = 4
A = torch.randint(1, 100, (size, size))
tr_A_4 = torch.trace(torch.matmul(torch.matmul(A, A), torch.matmul(A, A)))
tr_A_4_einsum = torch.einsum('ik, kj, jl, li', A, A, A, A)
assert  tr_A_4 == tr_A_4_einsum

hope this can help you out:
torch.einsum(“ij,jk,kl,lm → im”, A,A,A,A).trace()

I don’t really understand the Tensors and Common Linear Algebra Operations subsection. The sum at 22.1.32 is a bit confusing.


  • Using the function angle(), we compute
def angle(x, y):
    Computes the angle between two vectors.
    return torch.acos( / (torch.linalg.norm(x) * torch.linalg.norm(y)))
rad = angle(
    torch.tensor([1, 0, -1, 2], dtype=torch.float32),
    torch.tensor([3, 1, 0,  1], dtype=torch.float32)
rad, rad / 3.14 * 180

Output (in rad and degree, respectively):

(tensor(0.9078), tensor(52.0412))


  • True. We can verify straightforwardly by definition.
M1 = torch.tensor([[1, 2], [0, 1]])
M2 = torch.tensor([[1, -2], [0, 1]])
M1 @ M2, M2 @ M1


(tensor([[1, 0],
         [0, 1]]),
 tensor([[1, 0],
         [0, 1]]))


  • The determinant is the scaling factor of the transfomred area. We compute the determinant of the transform matrix:
torch.det(torch.tensor([[2, 3], [1, 2]], dtype=torch.float32))


  • Since the determinant is 1, the area is left unchanged as 100.


  • Linear independence is equivalent to a non-zero determinant.
    • Below shows that only the first set of vectors is linearly independent.
is_linear_indep = lambda X: bool(torch.det(X))
X1 = torch.tensor([[1, 2, 3], [0, 1, 1], [-1, -1, 1]], dtype=torch.float32)
X2 = torch.tensor([[3, 1, 0], [1, 1, 0], [1, 1, 0]], dtype=torch.float32)
X3 = torch.tensor([[1, 0, 1], [1, 1, 0], [0, -1, 1]], dtype=torch.float32)
is_linear_indep(X1), is_linear_indep(X2), is_linear_indep(X3)


(True, False, False)


  • It is true. By definition,


  • We take the dot product,
    Denoting image, one has the condition for orthogonality which reads


  • We hierarchically “expand” the summation according to the definition of matrix multiplication:

    • The corresponding Einstein summation notation should be ij, jk, kl, li ->, where an empty string is used to represent a scalar.
# Check our result
A = torch.randn(3, 3)
einsum = torch.einsum("ij, jk, kl, li ->" , A, A, A, A)
manual = torch.trace(A @ A @ A @ A)
einsum.item(), manual.item()


(141.1223907470703, 141.12237548828125)