Um, I dont think we have the right pip installs to run this section, the second cell (the first cell after the pip cells) did not work for me right off the bat, but I dont know exactly how I fixed it, I just took the pip’s from the last section and put em in and it ran. To hear about how you guys fixed and and why it works would also be helpful if youre willing
Hi @smizerex, can you attach a code/error snap for us to reproduce the error?
Yeah, give me a sec.
Okay, so when I leave the code like this:
I get this:
But when I copy this from 8.1:
BOOM, it works:
Hi @smizerex, please make sure your local
d2l notebooks folder is up-to-date with our github. You can execute the following code on your terminal to rebase your
git fetch origin master
git rebase origin/master
How could it have gotten out of sync in the first place?
what the point of the first sorting?
# Sort according to frequencies counter = count_corpus(tokens) self.token_freqs = sorted(counter.items(), key=lambda x: x) self.token_freqs.sort(key=lambda x: x, reverse=True)
I think the first sort is unnecessary. The sorting code can be simplified as
self.token_freqs = sorted(counter.items(), key=lambda x: x, reverse=True)
pip install -U git+https://github.com/d2l-ai/d2l-en.git@master
I agree. Fixing: https://github.com/d2l-ai/d2l-en/pull/1543
Why tocken with frequency > min_freq is a unique token??
uniq_tokens += [token for token, freq in self.token_freqs
if freq >= min_freq and token not in uniq_tokens]
I was wondering if we can use
nltk package for this.
- TreeBankWordTokenizer: It separates the word using spaces and punctuation.
- PunktWordTokenizer: It does not separate punctuation from the word
- WordPunktTokenizer: It separate punctuation from the word
- It decreased exponentially