What is the purpose of the following term $\exp(\min(0, 1-\frac{len_{label}}{len_{pred}})$ in the BLEU score?
We try to penalise the more the difference is between the predicted sequence length and the target sequence length. Therefore, this term gives 1 for when both sequences have the same length, and diverges otherwise.
many tricks which are enabled in previous sections have been removed(ignored) in this section ,eg. detach, zero grad etc.but I think this is a simplified version instead
how about ‘edit distance’?
is it like concatenating the 2 sequences: [eng fra]
with 2 GRUs
_, state = GRU1(eng sequence, None)
outputs, _ =GRU2(fra sequence, state)
#or
#outputs, _ =GRU2(cat[fra sequence, state], None)
Y_hat = linear(outputs)
loss(Y_hat, fra)
if this correct then could it not be EncoderDecoder