Bahdanau 注意力

http://zh.d2l.ai/chapter_attention-mechanisms/bahdanau-attention.html

hidden_​​state 的形状为(num_layers,batch_size,num_hiddens),为什么这里使用的是hidden_​​state[-1] ,而不是hidden_​​state

hidden_state大小为(num_layers, batch_size, num_hiddens)
query大小为 (batch_size, 1, num_hiddens)

hidden_state[-1]取出了大小为(batch_size, num_hiddens)的矩阵。
query = torch.unsqueeze(hidden_state[-1], dim=1)hidden_state[-1]大小改为 (batch_size, 1, num_hiddens)然后赋值给query

1 Like

Why key_size query_size num_hiddens are all equal to num_hiddens?

image
2.应该改为“最终“时间步的编码器全层隐状态

应该把训练集拆分成训练集,测试集,最后在测试集算下平均BLEU. 我的参数,损失,BLEU如下【有更好的参数设置欢迎加进来】:
embed_size, num_hiddens, num_layers, dropout = 128, 128, 4, 0.1
batch_size, num_steps = 64, 10
lr, num_epochs, device = 0.005, 100, d2l.try_gpu()
train_iter, src_vocab, tgt_vocab, test = d2l.load_data_nmt(batch_size, num_steps, num_examples=10000)
image
BLEU: 25

拆分数据集程序:
def load_data_nmt(batch_size, num_steps, num_examples=600):

"""Return the iterator and the vocabularies of the translation dataset."""

text = preprocess_nmt(read_data_nmt())

source, target = tokenize_nmt(text, num_examples)

# 改

# source_train, target_train = source, target

# src_vocab = d2l.Vocab(source_train, min_freq=2,

#                       reserved_tokens=['<pad>', '<bos>', '<eos>'])

# tgt_vocab = d2l.Vocab(target_train, min_freq=2,

#                       reserved_tokens=['<pad>', '<bos>', '<eos>'])

# src_array, src_valid_len = build_array_nmt(source_train, src_vocab, num_steps) # (n, num_steps), (n, )

# tgt_array, tgt_valid_len = build_array_nmt(target_train, tgt_vocab, num_steps)

# data_arrays = (src_array, src_valid_len, tgt_array, tgt_valid_len)

# data_iter = d2l.load_array(data_arrays, batch_size)

# return data_iter, src_vocab, tgt_vocab



# 替换为:

# 拆分训练集,测试集

source_train, source_test, target_train, target_test = train_test_split(source, target, test_size=0.1, random_state=66)

print('len(source_train, source_test)', len(source_train), len(source_test))

# 使用训练集做词表

src_vocab = d2l.Vocab(source_train, min_freq=2,

                      reserved_tokens=['<pad>', '<bos>', '<eos>'])

tgt_vocab = d2l.Vocab(target_train, min_freq=2,

                      reserved_tokens=['<pad>', '<bos>', '<eos>'])

print('len(src_vocab) = ',len(src_vocab))

print('len(tgt_vocab) = ',len(tgt_vocab))

# 训练集

src_array, src_valid_len = build_array_nmt(source_train, src_vocab, num_steps) # (n, num_steps), (n, )

tgt_array, tgt_valid_len = build_array_nmt(target_train, tgt_vocab, num_steps)

data_arrays = (src_array, src_valid_len, tgt_array, tgt_valid_len)

data_iter = d2l.load_array(data_arrays, batch_size)

# 测试集,测试集用训练集的词表

# test_src_array, test_src_valid_len = build_array_nmt(source_test, src_vocab, num_steps)

# test_tgt_array, test_tgt_valid_len = build_array_nmt(target_test, tgt_vocab, num_steps)

# test_src_tgt = (test_src_array, test_src_valid_len, test_tgt_array, test_tgt_valid_len)

source_test = [' '.join(i) for i in source_test]

target_test = [' '.join(i) for i in target_test]

test = (source_test, target_test)

return data_iter, src_vocab, tgt_vocab, test