1. Chai, J., Zeng, H., Li, A. & Ngai, E. W. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Mach. Learn. with Appl. 6, 100134 (2021).
2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
3. Khan, S. et al. Transformers in vision: A survey. ACM Comput. Surv. 54, 1–41 (2022).
4. Han, K., Wang, Y., Chen, H., Chen, X. et al. A survey on vision transformer. IEEE Trans. on Pattern Anal. & Mach. Intell.45, 87–110 (2022).
5. Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C. & Dosovitskiy, A. Do vision transformers see like convolutional neural networks?. Adv. Neural. Inf. Process. Syst. 34, 12116–12128 (2021).