1. Improving Language Understanding by Generative Pre-training;Radford,2018
2. Bert: pre-training of deep bidirectional transformers for language understanding;Devlin;arXiv,2018
3. Language models are unsupervised multitask learners;Radford;OpenAI Blog,2019
4. Deep Contextualized Word Representations. Held in New Orleans, Louisiana;Peters,2018
5. Cross-lingual language model pretraining;Conneau;Adv. Neural Inf. Process. Syst.,2019