1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 353–355 (2019)
2. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Association for Computational Linguistics (ACL) (2018)
3. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: Spanbert: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2019)
4. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations. In: ICLR (2020)
5. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. ArXiv (2019)