Natural Language Inference: Fine-Tuning BERT

https://d2l.ai/chapter_natural-language-processing-applications/natural-language-inference-bert.html

It is saying " These two loss functions are irrelevant to fine-tuning downstream applications, thus the parameters of the employed MLPs in MaskLM and NextSentencePred are not updated (staled) when BERT is fine-tuned."

How to achieve it? could you please share some sample code for this trick?

The model it used didn’t include the mlp layers for MLM and NSP.
Btw, if you want to “frozen” any layer, you can set the parameter requires_grad to False, it’s more like a “transfer learning”, like this:

class BERTClassifier(nn.Module):
    def __init__(self, bert):
        super(BERTClassifier, self).__init__()
        self.encoder = bert.encoder
        for param in self.encoder.parameters():
            param.requires_grad = False
        self.hidden = bert.hidden
        for param in self.hidden.parameters():
            param.requires_grad = False

        self.output = nn.LazyLinear(3)


    def forward(self, inputs):
        tokens_X, segments_X, valid_lens_x = inputs
        encoded_X = self.encoder(tokens_X, segments_X, valid_lens_x)
        return self.output(self.hidden(encoded_X[:, 0, :]))

on windows, I’m experience following errors.
anyone know how to fix it?

devices = d2l.try_all_gpus()
bert, vocab = load_pretrained_model(
‘bert.small’, num_hiddens=256, ffn_num_hiddens=512, num_heads=4,
num_blks=2, dropout=0.1, max_len=512, devices=devices) TypeError Traceback (most recent call last)
/tmp/ipykernel_7860/847789108.py in
2 bert, vocab = load_pretrained_model(
3 ‘bert.small’, num_hiddens=256, ffn_num_hiddens=512, num_heads=4,
----> 4 num_blks=2, dropout=0.1, max_len=512, devices=devices)

/tmp/ipykernel_7860/325226183.py in load_pretrained_model(pretrained_model, num_hiddens, ffn_num_hiddens, num_heads, num_blks, dropout, max_len, devices)
9 bert = d2l.BERTModel(
10 len(vocab), num_hiddens, ffn_num_hiddens=ffn_num_hiddens, num_heads=4,
—> 11 num_blks=2, dropout=0.2, max_len=max_len)
12 # Load pretrained BERT parameters
13 bert.load_state_dict(torch.load(os.path.join(data_dir,

TypeError: init() got an unexpected keyword argument ‘num_blks’

You can manually decompress the SNLI compressed package in the data folder after downloading it.

make sense. actually there are two components not used. so I think both will not be updated automatically, if any you can disable them manually .

certain questions:

  1. In last section “pertaining bert”, it used continuous sentences-pair in “wikitext-2” to do train model, while here uses NLI sentences pair(may be not continuous ) to do finetune, does it fit for both cases?