1. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, and others, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
2. Attention is all you need;Vaswani;Adv. Neural Inf. Process. Syst.,2017
3. Recent advances in convolutional neural networks;Gu;Pattern Recognit.,2018
4. Transformers in vision: a survey;Khan;ACM Comput. Surv. (CSUR),2022
5. Are vision transformers more data hungry than newborn visual systems?;Pandey;Adv. Neural Inf. Process. Syst.,2024