1. J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, 2019, pp. 4171–4186.
2. Roberta: A robustly optimized bert pretraining approach;Liu,2019
3. Language models are unsupervised multitask learners;Radford,2019
4. XLNet: Generalized autoregressive pretraining for language understanding;Yang,2019
5. Albert: A lite bert for self-supervised learning of language representations;Lan,2019