Linear Algebra

Because typically a vector is considered as a column vector. So $\mathbf{a}^T=[a_0, a_1, a_2, \cdots, a_n]$ ($\mathbf{a}^T$ becomes a row vector.). Now if you expand each element, you get the original matrix.

Ex1-3.

  • The transpose of transpose
    image

  • Commutativity of transpose and addition
    image

  • Symmetrization of a matrix
    image

    • The first equality is based on Ex2 (commutativity), and the second equality is based on Ex1 (transpose of transpose);
    • The last equality holds due to the commutativity of the addition operation

Ex4-5.

  • The len() method, similar to that in NumPy arrays, returns the first dimension of a given tensor.
  • The basis is the magic function __len__ implemented in the corresponding classes (e.g., arrays or tensors).
import torch
X = torch.arange(24).reshape(2, 3, 4)
len(X)

Output:

2

Ex6 & 8

A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
A / A.sum(axis=1)

Output:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[3], line 2
      1 A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
----> 2 A / A.sum(axis=1)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1
  • The size of A is (2,3), and A.sum(axis=1) yields a reduced summation tensor in shape (2, ).
    • With two tensors of different numbers of axes (different dimensionalities), neither the direct element-wise operation / nor its broadcast version can be performed.
A.shape, A.sum(axis=1).shape

Output:

(torch.Size([2, 3]), torch.Size([2]))
  • The operation that this problem really “wants” to perform is probably a column-wise normalization, where each element image is divided by the summation of all elements in the row where image lies, so that each row of the resulting tensor sums up to 1. But since the particular dimension to sum has been reduced, the broadcasting scheme will fail.
    • A solution: use keepdims=True to preserve the axis. After the summation, the size of this axis will be 1, but not zero.
A / A.sum(axis=1, keepdims=True)
tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167]])
  • A closer look at the reduced summation operation
B = torch.arange(24).reshape(2, 3, 4)
B.shape, B.sum(axis=0).shape, B.sum(axis=1).shape, B.sum(axis=2).shape

Output:

(torch.Size([2, 3, 4]),
 torch.Size([3, 4]),
 torch.Size([2, 4]),
 torch.Size([2, 3]))

Ex7.

  • In downtown Manhattan, streets are rectilinearly distributed, and one has to travel from point image to image along a trajectory composed of only horizontal and vertical lines.
    • In such a case, the shortest distance to travel is image.
  • For a given space with dimension image, the Manhattan distance between two points image and image is defined as
    image
    • This is applicable in signal processing, Machine Learning models, etc.

Ex9.

  • By default, the torch.linalg.norm() function computes the Frobenius norm (for more-than-one ordered tensors) or the L2-norm (for one-ordered tensors, or vectors), as per the documentation.
    image

  • Below shows an example of a 4-ordered tensor

torch.manual_seed(706)
X = torch.randn(2, 3, 4, 5)
torch.linalg.norm(X), ((X ** 2).sum()) ** 0.5

Output:

(tensor(11.9074), tensor(11.9074))
1 Like

Ex12.

A = torch.ones(100, 200, dtype=torch.int32)
B = torch.ones_like(A) * 2
C = torch.ones_like(A) * 3
stacked = torch.stack([A, B, C])
stacked.shape

Output:

torch.Size([3, 100, 200])
  • By default, the torch.stack() function constructs a new dimension at axis 0, along which the same-sized tensors are stacked, as per the documentation.
    • To slice out B, one indexes 1 in the first axis.
(stacked[1] == B).sum() == B.numel()

Output:

tensor(True)

Exercise 10: let’s me clarify A,B and C they are same tensor dimension

# Pytorch
import time

# Define matrix dimensions
dim_A = (216, 216)
dim_B = (216, 216)
dim_C = (216, 216)

# Generate random matrices A, B, and C
A = torch.randn(*dim_A)
B = torch.randn(*dim_B)
C = torch.randn(*dim_C)

# Compute (AB)C and measure time and memory
start_time = time.time()
AB = np.dot(A, B)
ABC = np.dot(AB, C)
execution_time_ABC = time.time() - start_time
memory_usage_ABC = ABC.nbytes / (1024 * 1024)  # Convert bytes to MB

# Compute AC^T and measure time and memory
start_time = time.time()
AC_T = np.dot(A, C.T)
execution_time_ACT = time.time() - start_time
memory_usage_ACT = AC_T.nbytes / (1024 * 1024)  # Convert bytes to MB

# Print results
print("(AB)C:")
print("Execution time:", execution_time_ABC, "seconds")
print("Memory usage:", memory_usage_ABC, "MB")
print("\nAC^T:")
print("Execution time:", execution_time_ACT, "seconds")
print("Memory usage:", memory_usage_ACT, "MB")

Ouput:

(AB)C:
Execution time: 0.0015840530395507812 seconds
Memory usage: 0.177978515625 MB

AC^T:
Execution time: 0.0008935928344726562 seconds
Memory usage: 0.177978515625 MB

Time and space complexity

Note: space complexity = memory footprint

  • $(AB)C$: They are require 2 times for matrix multiplication as $A$ multiplication $B$ and then $AB$ multiplication $C$ for matrix multiplication. If we are going deeper into the pseudo-code of matrix multiplication:
def multiply_2_matrix(matrix_a, matrix_b):
    rows_a = len(matrix_a)
    cols_a = len(matrix_a[0])
    rows_b = len(matrix_b)
    cols_b = len(matrix_b[0])
    
    if cols_a != rows_b:
        raise ValueError("Number of columns in matrix_a must match the number of rows in matrix_b")
    
    result = [[0] * cols_b for _ in range(rows_a)]
    
    # traveling through all rows of A
    for i in range(rows_a):
        # traveling through all columns of B
        for j in range(cols_b):
            # traveling through all columns of A 
            for k in range(cols_a):
                # at any ij position = sum of i*k
                result[i][j] += matrix_a[i][k] * matrix_b[k][j]
    
    return result

So in matrix multiplication:

$;;;;$ Time complexity = 2 times of operation($A$ x $B$ and then $AB$ x $C$) is $O(i^2j)$ per 1 multiplication → In summary $O(i^2j)$

$;;;;$ Space complexity = $O(ij)$ (They not store new memory in next times operation just update result)


  • $AC^T$: They are require 1 times for matrix multiplication as $A$ multi $C^T$ for matrix multiplication. If we are going deeper into the pseudo-code of transpose matrix:
def transpose_matrix(matrix):
    # New list of transpose matrix
    transposed = []
    # traveling through all columns
    for i in range(len(matrix[0])):
        transposed_row = []
        # update column as row
        for row in matrix:
            transposed_row.append(row[i])
        transposed.append(transposed_row)
    return transposed

So in matrix multiplication:

$;;;;$ Time complexity = 1 times of operation($A$ x $C^T$) is $O(i^2j) per 1 multiplication. And 1 times of transpose matrix of C in $O(ij)$. Even though time complexity of $AC^T$ = $(AB)C$ is $O(i^2j)$ but $O(ij)$ of transpose will less than $O(i^2j)$ of multiplication. → In summary $O(i^2j)$

$;;;;$ Space complexity = $O(ij)$ (Even $C^T$ will create new memory but just a tempory memory when multiplication with $A$ will update an store in result memory).


So that why $(AB)C$ need to spend more time $AC^T$ while memory footprint are equal.