Working with Sequences

A few questions about the MLP:

def get_net():
    net = nn.Sequential(nn.Linear(4, 10), nn.ReLU(), nn.Linear(10, 1))
    net.apply(init_weights)
    return net
  1. Is it correct to call this Multi-Layer Perceptron an RNN? Or does calling something an RNN only depend on the having a sliding window training & label set?

  2. tau is 4 in this case correct? What do both 10s mean contextually?

A few about the max steps section

  1. Are you predicting a sequence of length step size, or are you shifting each window by the step size?

I’m confused about this code in Chapter 9.1. If I understand correctly, our FEATURE should be a T-tau fragment of length tua; why is the FEATURE here actually a tau fragment of length T-tau

def get_dataloader(self, train):
    features = [self.x[i : self.T-self.tau+i] for i in range(self.tau)]
    self.features = torch.stack(features, 1)
    self.labels = self.x[self.tau:].reshape((-1, 1))
    i = slice(0, self.num_train) if train else slice(self.num_train, None)
    return self.get_tensorloader([self.features, self.labels], train, i)

My solutions to the exs: 9.1

1 Like

torch.stack converts the features list whose shape is (4, 996) to (996, 4), which means 996 samples with 4 feature

Is this correct? IMO, ‘overestimate’ should be replaced by ‘underestimate’. If we rely on statistics, we are unlikely to meet the infrequent words. So will conclude they have zero or close to zero occurrences. That is why the use of ‘overestimate’ is perplexing.

“This should already give us pause for thought if we want to model words by counting statistics. After all, we will significantly overestimate the frequency of the tail, also known as the infrequent words.”

https://d2l.ai/chapter_recurrent-neural-networks/text-sequence.html

9.1. Working with Sequences

I’m quite amazed but what a bad performance this extra non-linear layer added. Does anyone understand why this is so bad now? (trained for 10 epochs)
"This is all I changed in the net
self.net = nn.Sequential(
nn.LazyLinear(10), # Lazy initialization for an input layer with 4 outputs
nn.ReLU(), # Another non-linear activation function
nn.LazyLinear(1) # Output layer producing 1 output
)
"

image
image

def change(features, preds, i):
features[i+1:, -1] = preds[i:-1, :].reshape(1, -1)
return features

def shift(features, j, i):
features[i+1:, j-1] = features[i:-1, j]
return features

@d2l.add_to_class(Data)
def insert_kth_pred(self, pred, k):
for i in range(1, k):
self.features = shift(self.features, i, self.tau-i)
self.features = change(self.features, pred, k-1)

for i in range(4):
preds = model(data.features).detach()
data.insert_kth_pred(preds, i+1)
trainer.fit(model, data)