1. BERT: Pre-training of deep bidirectional transformers for language understanding;DEVLIN,2019
2. Exploring the limits of transfer learning with a unified text-to-text transformer;RAFFEL;Journal of Machine Learning Research,2020
3. Scaling instruction: Finetuned language models;CHUNG;Journal of Machine Learning Research,2024