https://zh.d2l.ai/chapter_recurrent-neural-networks/rnn.html
我看了英文原版,这一段话应该是翻译错了:“由于训练数据中这个文本序列的下一个字符是“h”, 因此第
个时间步的损失将取决于下一个字符的概率分布, 而下一个字符是基于特征序列“m”“a”“c”和这个时间步的标签“h”生成的”
英文原版: Since the next character of the sequence in the training data is “h”, the loss of time step 3 will depend on the probability distribution of the next character generated based on the feature sequence “m”, “a”, “c” and the target “h” of this time step.