1. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., & Schmid, C. (2021). ViViT: A Video Vision Transformer. arXiv: 2103.15691.
2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,. Gomez, A., & Polosukhin, I. (2017). Attention Is All You Need. arXiv: 1706.03762.
3. End-to-end object detection with transformers;Carion,2020
4. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., & Zhou, Y. (2021). Transunet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv: 2102.04306.
5. Vehicle Re-Identification Using Distance-Based Global and Partial Multi-Regional Feature Learning;Chen;IEEE Transactions on Intelligent Transportation Systems,2020