1. Matthew E. Peters, et al., Deep Contextualized Word Representations, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018.
2. Improving language understanding by generative pre-training;Radford,2018
3. Kenton, Jacob Devlin, Ming-Wei Chang, et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of NAACL-HLT, 2019.
4. Roberta: A robustly optimized bert pretraining approach;Liu,2019
5. Albert: A lite bert for self-supervised learning of language representations;Lan,2019