1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Devlin
2. RoBERTa: A Robustly Optimized BERT Pretraining Approach;Liu,2019
3. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations;Lan,2019
4. Language Models are Few-Shot Learners;Brown
5. LLaMA: Open and Efficient Foundation Language Models;Touvron