A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning-Reference-Cited by-同舟云学术

A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning

Published:2023-03-16 Issue:3 Volume:14 Page:187
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Kotei Evans¹^ORCID,Thirunavukarasu Ramkumar¹^ORCID

Affiliation:

1. School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India

Abstract

Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was carried out on publications retrieved from various digital libraries such as SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, which formed the Primary studies. Secondary studies were retrieved from Primary articles using the backward and forward snowballing approach. Based on set inclusion and exclusion parameters, relevant publications were selected for review. The study focused on transfer learning pretrained NLP models based on the deep transformer network. BERT and GPT were the two elite pretrained models trained to classify global and local representations based on larger unlabeled text datasets through self-supervised learning. Pretrained transformer models offer numerous advantages to natural language processing models, such as knowledge transfer to downstream tasks that deal with drawbacks associated with training a model from scratch. This review gives a comprehensive view of transformer architecture, self-supervised learning and pretraining concepts in language models, and their adaptation to downstream tasks. Finally, we present future directions to further improvement in pretrained transformer-based language models.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/14/3/187/pdf

Reference126 articles.

1. Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014). A Convolutional Neural Network for Modelling Sentences. arXiv.

2. Liu, P., Qiu, X., and Xuanjing, H. (2016, January 9–15). Recurrent neural network for text classification with multi-task learning. Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, New York, NY, USA.

3. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv.

4. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020, January 5–10). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.

5. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the NAACL HLT 2019—2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An efficient context-aware approach for whole-slide image classification;iScience;2023-12

2. Visual attention condenser model for multiple disease detection from heterogeneous medical image modalities;Multimedia Tools and Applications;2023-09-07

3. A Comprehensive Review on Advancement in Deep Learning Techniques for Automatic Detection of Tuberculosis from Chest X-ray Images;Archives of Computational Methods in Engineering;2023-08-28

4. Digital-Reported Outcome from Medical Notes of Schizophrenia and Bipolar Patients Using Hierarchical BERT;Information;2023-08-22

5. Tolstoy’s Genius Explored by Deep Learning Using Transformer Architecture;SSRN Electronic Journal;2023