1. Bo, D., Wenhai, W., Deng-Ping, F., Jinpeng, L., Huazhu, F., Ling, S., 2023. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. In: CAAI AIR.
2. End-to-end object detection with transformers;Carion,2020
3. Twins: Revisiting the design of spatial attention in vision transformers;Chu;Adv. Neural Inf. Process. Syst.,2021
4. Imagenet: A large-scale hierarchical image database;Deng,2009
5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: ICLR.