1. Language Models are Few-Shot Learners;Brown;Adv. Neural Inf. Process. Syst.,2020
2. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.
3. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv.
4. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer;Raffel;J. Mach. Learn. Res.,2020
5. Training language models to follow instructions with human feedback;Ouyang;Adv. Neural Inf. Process. Syst.,2022