Customer Layers

https://d2l.ai/chapter_builders-guide/custom-layer.html

1 Like

Not a big deal but MyLinear should be named as MyDense in pytorch example for concordance with the text.

In 5.4.2 pytorch code of first chunk, we should use ‘self.weight’ and ‘self.bias’ rather than ‘self.weight.data’ and ‘self.bias.data’ if we want gradients existed for BP

I could not understand the meaning of the formula

y_k = \sum_{i, j} W_{ijk} x_i x_j

which computes a tensor reduction.
I don’t know the shape of the inputs and output.

2 Likes

this set of exercise is very difficult to understand personally? any leads on these?

Exercises
Design a layer that takes an input and computes a tensor reduction, i.e., it returns yk =
i,j Wijkxixj .
2. Design a layer that returns the leading half of the Fourier coefficients of the data.

Exercises and my silly answers

  1. Design a layer that takes an input and computes a tensor reduction, i.e., it returns yk = i,j Wijkxixj .
  • Not sure what is expected but is it the answer?
class LayerOne(nn.Module):
    def __init__(self, first, second):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(first, second))
        self.bias = nn.Parameter(torch.randn(second))
    
    def forward(self, X1, X2):
        out = torch.matmul(X1, self.weight)
        out = torch.matmul(out, X2)
        return F.relu(out)
  1. Design a layer that returns the leading half of the Fourier coefficients of the data.
1 Like

I do not very understand the meaning of exercise1. I try to realize the question as:
x is a vector, there are k matrices(weights) for producing k y_i, i for 1 to k.

In the perspective of Linear Algebra, if x is a column vector, y_k = x.T (W_k x). In machine learning, we often use the row vector as a datum, so I try to implement this algorithm like this:

class Layer(nn.Module):
    def __init__(self, N_X):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(N_X, N_X, N_X))
    
    def forward(self, x):
        y = torch.zeros_like(x)
        for k in range(x.shape[-1]):
            temp = torch.matmul(x, self.weight[k]) @ x.T
            y[:, k] = temp.diagonal()
        return y

Another implementation:

def forward(self, x):
    y = torch.zeros_like(x)
    temp = []
    for k in range(x.shape[-1]):
        temp.append(torch.matmul(x, self.weight[k]).unsqueeze(0))
        
    XW = torch.cat(temp, 0).permute((1, 0, 2))
    return torch.bmm(XW, x.unsqueeze(-1)).squeeze(-1)

I think there is still more room for improvement. Welcome to discuss!

Hello,

I agree with that for Exe1, the formula calculates the quadratic form of vector x (x.T * A * X), with respect to a specified number of square matrix A (indexed by k). I propose the following layer definition, taking the size of vector x and the number of matrix A as parameters:

class ReductionBlock(nn.Module):
    def __init__(self, size_in, size_out):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(size_out, size_in, size_in))

    def forward(self, x):
        out = torch.matmul(x, self.weight)
        out = torch.matmul(out, x)
        return out


my_reduction = ReductionBlock(2, 3)
x = torch.ones(2)
y = my_reduction(x)
print(y)

My solutions to the exs: 6.5