1. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin
2. Roberta: A robustly optimized bert pretraining approach;Liu,2019
3. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;arXiv preprint arXiv:1910.10683,2019
4. Language models are few-shot learners;Brown;arXiv preprint arXiv:2005.14165,2020
5. On the opportunities and risks of foundation models;Bommasani;arXiv preprint arXiv:2108.07258,2021