Working with Sequences

dhern023 · October 24, 2021, 11:24pm

A few questions about the MLP:

def get_net():
    net = nn.Sequential(nn.Linear(4, 10), nn.ReLU(), nn.Linear(10, 1))
    net.apply(init_weights)
    return net

Is it correct to call this Multi-Layer Perceptron an RNN? Or does calling something an RNN only depend on the having a sliding window training & label set?
tau is 4 in this case correct? What do both 10s mean contextually?

A few about the max steps section

Are you predicting a sequence of length step size, or are you shifting each window by the step size?

aqlkzf · March 11, 2023, 7:56am

I’m confused about this code in Chapter 9.1. If I understand correctly, our FEATURE should be a T-tau fragment of length tua; why is the FEATURE here actually a tau fragment of length T-tau

def get_dataloader(self, train):
    features = [self.x[i : self.T-self.tau+i] for i in range(self.tau)]
    self.features = torch.stack(features, 1)
    self.labels = self.x[self.tau:].reshape((-1, 1))
    i = slice(0, self.num_train) if train else slice(self.num_train, None)
    return self.get_tensorloader([self.features, self.labels], train, i)

pandalabme · September 2, 2023, 6:16am

My solutions to the exs: 9.1

cauliyang · October 22, 2023, 3:24am

torch.stack converts the features list whose shape is (4, 996) to (996, 4), which means 996 samples with 4 feature

kamesh · May 8, 2024, 9:18pm

Is this correct? IMO, ‘overestimate’ should be replaced by ‘underestimate’. If we rely on statistics, we are unlikely to meet the infrequent words. So will conclude they have zero or close to zero occurrences. That is why the use of ‘overestimate’ is perplexing.

“This should already give us pause for thought if we want to model words by counting statistics. After all, we will significantly overestimate the frequency of the tail, also known as the infrequent words.”

https://d2l.ai/chapter_recurrent-neural-networks/text-sequence.html

ElGreKost · May 12, 2024, 11:27am

9.1. Working with Sequences

I’m quite amazed but what a bad performance this extra non-linear layer added. Does anyone understand why this is so bad now? (trained for 10 epochs)
"This is all I changed in the net
self.net = nn.Sequential(
nn.LazyLinear(10), # Lazy initialization for an input layer with 4 outputs
nn.ReLU(), # Another non-linear activation function
nn.LazyLinear(1) # Output layer producing 1 output
)
"

Mohamed_Ahmed_Naji · June 13, 2024, 8:42pm

def change(features, preds, i):
features[i+1:, -1] = preds[i:-1, :].reshape(1, -1)
return features

def shift(features, j, i):
features[i+1:, j-1] = features[i:-1, j]
return features

@d2l.add_to_class(Data)
def insert_kth_pred(self, pred, k):
for i in range(1, k):
self.features = shift(self.features, i, self.tau-i)
self.features = change(self.features, pred, k-1)

for i in range(4):
preds = model(data.features).detach()
data.insert_kth_pred(preds, i+1)
trainer.fit(model, data)