Natural Language Inference: Fine-Tuning BERT

astonzhang · June 29, 2020, 10:46pm

https://d2l.ai/chapter_natural-language-processing-applications/natural-language-inference-bert.html

shafi_Q · September 23, 2020, 6:35pm

Hello, Thanks for the nice work!
I want to fine-tune BERT on the unlabeled newspaper text data. Then I will use the fine-tune BERT model with BERT embedding of https://bert-embedding.readthedocs.io/en/latest/api_reference/bert_embedding.html#bert_embedding.bert.BertEmbedding. First, the question is can do this. ? Second, how to fine-tune BERT. I can use this link for fine tuning https://d2l.ai/chapter_natural-language-processing-applications/natural-language-inference-bert.html. I don’t understand how to change this part

class BERTClassifier(nn.Block):
    def __init__(self, bert):
        super(BERTClassifier, self).__init__()
        self.encoder = bert.encoder
        self.hidden = bert.hidden
        self.output = nn.Dense(3)

def forward(self, inputs):
    tokens_X, segments_X, valid_lens_x = inputs
    encoded_X = self.encoder(tokens_X, segments_X, valid_lens_x)
    return self.output(self.hidden(encoded_X[:, 0, :]))

How to save the fine-tune Model to use it for bert embedding. ?

astonzhang · September 23, 2020, 7:32pm

For the bert embedding in your provided link, I think you probably can refer to the gluonnlp doc.

For educational purposes (to keep code simple), in D2L we pre-trained BERT for whole-word tokens (not wordpiece tokens) that is available for downloading. You may follow this section’s code to see how to load it. Actually the performance is quite close to original bert on several downstream tasks. So, if you want to use BERT embedding for whole-word tokens, feel free to modify this section’s code to load them and fine-tune it. For saving models, please refer to https://d2l.ai/chapter_deep-learning-computation/read-write.html or the official documentation of your framework.