1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA; 2017. p. 5998–6008.
2. Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: NAACL-HLT (1). Association for Computational Linguistics; 2019. p. 4171–4186.
3. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer;C Raffel;J Mach Learn Res,2020
4. Scao TL, Fan A, Akiki C, Pavlick E, Ilic S, Hesslow D, et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. CoRR. 2022;abs/2211.05100.
5. Wang Z, Li M, Xu R, Zhou L, Lei J, Lin X, et al. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners. CoRR. 2022;abs/2205.10747.