1. BERT: Pretraining of deep bidirectional transformers for language understanding;Devlin
2. Roberta: A robustly optimized bert pretraining approach;Liu,2019
3. Improving language understanding by generative pre-training;Radford,2018
4. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;The Journal of Machine Learning Research,2020
5. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020