1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
2. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
3. Roberta: A robustly optimized bert pretraining approach;Liu,2019
4. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;The Journal of Machine Learning Research,2020