Somewhen in somewhat video, I heard the idea of freezing the embedding matrix during the fine-tuning of a pre-trained language model. I completely forgot where it was said, but after that, usually do this. I decided to check if it’s needed and will share with you the results in this article. I found a similar experiment described in this post, but there were used GloVe vectors for initializing the embedding matrix of not pre-trained BiLSTM and there was no exploration of the difference between the validation and train vocabulary. Also found some papers: “To Tune or Not to Tune? Adapting…

Note: If you have the labeled data and you are not planning to operate directly with words, probably your solution could be a pre-trained network such as KoBERT or even fancier — KoELECTRA. If you are searching for some “lighter” approaches, you can build your vocabulary using something from HuggingFace’s tokenizers library and train what you want.



A few weeks ago I was faced with processing a few Asian languages: Chinese, Japanese, and Korean. To work with them properly I had first to understand their structure (how words are formed and how further they form the sentence). …

Transformer-based models have taken a leading role in NLP today. In most cases using pre-trained encoder architectures in solving downstream tasks achieves super high scores. The main idea of ​​this approach is to train the large model on a big amount of unlabeled data and then add few layers to the top of it for text classification, coreference resolution, question answering, and so on. However, although such models give cool results, those models are still black boxes, whereas the interpretability of a model is very important for debugging and understanding how the model makes a decision.

A couple of weeks…

