1. Han, X., et al.: Pre-trained models: past, present and future. ArXiv, abs/2106.07139 (2021)
2. Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics (2019)
3. Peters, M.E., et al.: Deep contextualized word representations. In: North American Chapter of the Association for Computational Linguistics (2018)
4. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
5. Radford, A., Narasimhan, K.: Improving language understanding by generative pre-training (2018)