I meant to say “exercise 1” my bad.
Demo Images
For section 8.3, our aim is to predict next token given the previous sequence of token (“the target is to predict the next token based on what tokens we have seen so far”)
Why we shift the token by one offset (" hence the labels are the original sequence, shifted by one token."). For example, for X= [27,28, 29, 30, 31] expected Y should be [33,34,35,…] but we are taking a shifted offset. Why we are doing this, as we want to predict next token(s), which are 33, 34, and so on?
1 replyBoth functions produce adjacent sequences based on the input. Textual sequences have a step size of 1 because we want directly adjacent words. In you case, you could gave it list(range(0,70,2)) and your example would work just fine.
@zhangjiekui I rewrote the author’s code to match yours,
def seq_data_iter_sequential(corpus, batch_size, step_size, use_random_offset=True):
num_batches=(len(corpus)-1)//(batch_size*step_size)
surplus=(len(corpus)-1)%(batch_size*step_size)
## Start with a random offset to partition a sequence
if use_random_offset:
index_offset = numpy.random.randint(low = 0, high = surplus -1)
else:
index_offset=0
num_tokens = ((len(corpus) - index_offset - 1) // batch_size) * batch_size
Xs = torch.tensor(corpus[index_offset : index_offset + num_tokens ])
Ys = torch.tensor(corpus[index_offset + 1: index_offset + num_tokens + 1])
Xs, Ys = Xs.reshape(batch_size, -1), Ys.reshape(batch_size, -1)
for i in range(0, num_batches * step_size, step_size):
X = Xs[:, i: i + step_size]
Y = Ys[:, i: i + step_size]
yield X, Y
def generate_sequence_batches_via_random_sampling(corpus, batch_size, step_size):
# Random offset to partition a sequence
index_offset = numpy.random.randint(low = 0, high = step_size)
num_subsequences = (len(corpus) - index_offset - 1) // step_size # Subtracting 1 accounts for labels
indices = list(range(0, num_subsequences * step_size, step_size))
numpy.random.shuffle(indices)
# Constructs the sliding window batches of length batch_size & shifted by batch_size
num_batches = num_subsequences // batch_size
for i in range(0, batch_size * num_batches, batch_size):
indices_subset = indices[i:i + batch_size] # sliding window shifted by batch size
# X has shape (batch_size, step_size)
X = [corpus[index + index_offset : index + index_offset + step_size ] for index in indices_subset]
Y = [corpus[index + index_offset + 1 : index + index_offset + step_size + 1] for index in indices_subset]
yield torch.tensor(X), torch.tensor(Y)
Even though the shape your outputs is the same, I think you’re missing the crucial step of shuffling the indices.