Improving Pre-trained Language Models-Reference-Cited by-同舟云学术

Improving Pre-trained Language Models

Published:2023 Issue: Volume: Page:79-159
ISSN:2365-3051
Container-title:Artificial Intelligence: Foundations, Theory, and Algorithms
language:
Short-container-title:

Author:

Paaß Gerhard,Giesselbach Sven

Abstract

AbstractThis chapter describes a number of different approaches to improve the performance of Pre-trained Language Models (PLMs), i.e. variants of BERT, autoregressive language models similar to GPT, and sequence-to-sequence models like Transformers. First we may modify the pre-training tasks to learn as much as possible about the syntax and semantics of language. Then we can extend the length of the input sequence to be able to process longer inputs. Multilingual models are simultaneously trained with text in different languages. Most important is the inclusion of further knowledge into the PLM to produce better predictions. It turns out that by increasing the number of parameters, the size of the training data and the computing effort the performance of the models can always be increased. There are a number of different fine-tuning strategies which allow the model to be adapted to special tasks. In addition, models may be instructed by few-shot prompts to solve specific tasks. This is especially rewarding for larger PLMs, which therefore are called Foundation Models.

Publisher

Springer International Publishing

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-23190-2_3

Reference261 articles.

1. O. Agarwal, H. Ge, S. Shakeri, and R. Al-Rfou. “Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training”. Mar. 13, 2021. arXiv: 2010.12688.

2. A. Aghajanyan, A. Shrivastava, A. Gupta, N. Goyal, L. Zettlemoyer, and S. Gupta. “Better Fine-Tuning by Reducing Representational Collapse”. Aug. 6, 2020. arXiv: 2008.03156.

3. J. Ainslie, S. Ontanon, C. Alberti, P. Pham, A. Ravula, and S. Sanghai. “ETC: Encoding Long and Structured Data in Transformers”. 2020. arXiv: 2004.08483.

4. A. Alvi. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World fs Largest and Most Powerful Generative Language Model. Microsoft Research. Oct. 11, 2021. url: https://www.microsoft.com/en-us/research/blog/using-deepspeed-andmegatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/ (visited on 11/12/2021).

5. A. Askell et al. “A General Language Assistant as a Laboratory for Alignment”. Dec. 9, 2021. arXiv: 2112.00861 [cs].

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multilingual Model Fine-tuning for Sentiment Analysis;2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS);2023-11-21