http://zh-v2.d2l.ai/chapter_natural-language-processing-pretraining/subword-embedding.html
第一题的论文内容
In order to bound the memory requirements of our
model, we use a hashing function that maps n-grams
to integers in 1 to K. We hash character sequences
using the Fowler-Noll-Vo hashing function (specifi-
cally the FNV-1a variant).1 We set K = 2.106 be-
low. Ultimately, a word is represented by its index
in the word dictionary and the set of hashed n-grams
it contains.
a couple of questions:
a. in fastText, there are some specified lengths for subwords. does it mean that there are some different types of subwords set in vocab?
b. so this Subword embedding can not be used in asian languages, eg. Chinese , Japanese?