Natural Language Inference: Fine-Tuning BERT

anirudh · December 5, 2020, 10:44pm

https://d2l.ai/chapter_natural-language-processing-applications/natural-language-inference-bert.html

zppet · November 26, 2021, 7:07am

It is saying " These two loss functions are irrelevant to fine-tuning downstream applications, thus the parameters of the employed MLPs in MaskLM and NextSentencePred are not updated (staled) when BERT is fine-tuned."

How to achieve it? could you please share some sample code for this trick?

YookoTian · March 9, 2023, 3:54pm

The model it used didn’t include the mlp layers for MLM and NSP.
Btw, if you want to “frozen” any layer, you can set the parameter requires_grad to False, it’s more like a “transfer learning”, like this:

class BERTClassifier(nn.Module):
    def __init__(self, bert):
        super(BERTClassifier, self).__init__()
        self.encoder = bert.encoder
        for param in self.encoder.parameters():
            param.requires_grad = False
        self.hidden = bert.hidden
        for param in self.hidden.parameters():
            param.requires_grad = False

        self.output = nn.LazyLinear(3)


    def forward(self, inputs):
        tokens_X, segments_X, valid_lens_x = inputs
        encoded_X = self.encoder(tokens_X, segments_X, valid_lens_x)
        return self.output(self.hidden(encoded_X[:, 0, :]))

km5ar · May 5, 2023, 3:42pm

on windows, I’m experience following errors.
anyone know how to fix it?

sulaiman_khan · July 17, 2023, 7:24am

devices = d2l.try_all_gpus()
bert, vocab = load_pretrained_model(
‘bert.small’, num_hiddens=256, ffn_num_hiddens=512, num_heads=4,
num_blks=2, dropout=0.1, max_len=512, devices=devices) TypeError Traceback (most recent call last)
/tmp/ipykernel_7860/847789108.py in
2 bert, vocab = load_pretrained_model(
3 ‘bert.small’, num_hiddens=256, ffn_num_hiddens=512, num_heads=4,
----> 4 num_blks=2, dropout=0.1, max_len=512, devices=devices)

/tmp/ipykernel_7860/325226183.py in load_pretrained_model(pretrained_model, num_hiddens, ffn_num_hiddens, num_heads, num_blks, dropout, max_len, devices)
9 bert = d2l.BERTModel(
10 len(vocab), num_hiddens, ffn_num_hiddens=ffn_num_hiddens, num_heads=4,
—> 11 num_blks=2, dropout=0.2, max_len=max_len)
12 # Load pretrained BERT parameters
13 bert.load_state_dict(torch.load(os.path.join(data_dir,

TypeError: init() got an unexpected keyword argument ‘num_blks’

Hongli_Zhou · December 15, 2023, 2:59pm

You can manually decompress the SNLI compressed package in the data folder after downloading it.

JH.Lam · January 22, 2025, 10:23am

make sense. actually there are two components not used. so I think both will not be updated automatically, if any you can disable them manually .

JH.Lam · January 22, 2025, 10:29am

certain questions:

In last section “pertaining bert”, it used continuous sentences-pair in “wikitext-2” to do train model, while here uses NLI sentences pair(may be not continuous ) to do finetune, does it fit for both cases?