1. Attention is all you need;Vaswani;Adv. Neural Inf. Process. Syst.,2017
2. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2021
3. End-to-end object detection with transformers;Carion,2020
4. SegFormer: Simple and efficient design for semantic segmentation with transformers;Xie;Adv. Neural Inf. Process. Syst.,2021
5. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.