1. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
2. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020
3. Attention is all you need, CoRR;Vaswani,2017
4. Convmlp: Hierarchical convolutional mlps for vision;Li,2021
5. Efficient content-based sparse attention with routing transformers;Roy;Trans. Assoc. Comput. Linguist.,2021