Fine-Tuning BERT for Sequence-Level and Token-Level Applications

https://d2l.ai/chapter_natural-language-processing-applications/finetuning-bert.html

Exercise 1.- One idea would be to creative negative samples by picking at random any article (except for the labelled) and labelling it with 0. Then the training dataset for BERT would be pairs of queries, articles and its corresponding target (1 or 0).
Finally to get the ranking, we would have to run the model for a specific query against the n articles. Then, all we would have to do, is to sort the articles according to the model softmax output to get the relevancy ranking.

For the question-answering task, in my opinion, the end of the answer span should be relative to the start position (or the start token). But here the book uses 3 independent FC layers for inference, which I kind of do not agree with that.

will that be possible to also provide case example/code to fine tuning for text classification, text tagging and QA?

hi guy, the dense is shared with weights , right?