1. Deep Contextualized Word Representations
2. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019
3. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin,2019
4. RoBERTa: A robustly optimized bert pretraining approach;Liu,2019
5. XLNet: Generalized autoregressive pretraining for language understanding;Yang,2019