Sequence Models

Thanks for making this book available.

In the toy example 8.1.2, when creating the dataset, I was wondering if it was normal for both the train set and the test set to be the same. Namely:

train_iter = d2l.load_array((features[:n_train], labels[:n_train]),
                            batch_size, is_train=True)
test_iter = d2l.load_array((features[:n_train], labels[:n_train]),
                           batch_size, is_train=False)

For test_iter, I would have expected something like:

test_iter = d2l.load_array((features[n_train:], labels[n_train:]),
                           batch_size, is_train=False)

Thanks for your time.

Great catch @dosssman! I believe we don’t need test_iter, as it is never used after being defined.

I see. I did not get that far down yet haha.

8.1.2. A Toy Example

features = d2l.zeros((T-tau, tau))
AttributeError : module ‘d2l.torch’ has no attribute ‘zeros’
Then I search
No source code:
I can use ``features = d2l.torch.zeros((T-tau, tau))` to replace now, and try to code next time!
:cold_face:An hour to debug!

Hi @StevenJokes, great try! Your effort will ultimately gain some tractions!

for i in range(tau):
    features[:, i] = x[i: i + T - tau - max_steps + 1].T

What’s the purpose of .T at the end of the line above? It seems making no difference

I can’t agree more. Transposing a 1-rank tensor returns exactly itself.

Also this code

for i in range(n_train + tau, T):
    multistep_preds[i] = d2l.reshape(net(
    multistep_preds[i - tau: i].reshape(1, -1)), 1)

can be simply written as

for i in range(n_train + tau, T):
    multistep_preds[i] = net(multistep_preds[i - tau: i])

@ducatyb @swg104

I agree. Fixing:
Next time you can PR first:

I couldn’t help but notice the similarities between the latent autoregressive model and hidden Markov models. The difference being that in the case of latent autoregressive model the hidden sequence h_t might change over time t and in the case of Hidden markov models the hidden sequence h_t remains the same for all t. Am I correct in assuming this?

Hi everybody,
I have a question about math. In particular what does the sum on x_t in eqution 8.1.4 mean?

Is that the sum over all the possible state x_t? But that does not make a lot of sense to me, because if I have observed x_(t+1) there is just one possible x_t.

Could someone help me in understanding that?

Thanks a lot!