1. Attention is all you need;Vaswani;Adv. Neural Inf. Process. Syst.,2017
2. Pre-trained models for natural language processing: A survey;Qiu;Sci. China Technol. Sci.,2020
3. A survey of transformers;Lin;AI Open,2022
4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
5. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: ICLR, 2021.