1. Wu, B., Xu, C., Dai, X., et al.: Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020)
2. Zheng, S., Lu, J., Zhao, H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, USA, pp. 6881–6890. IEEE (2021)
3. Geirhos, R., Rubisch, P., Michaelis, C., et al.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2018)
4. Tuli, S., Dasgupta, I., Grant, E., et al.: Are convolutional neural networks or transformers more like human vision? In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 43 (2021)
5. You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 231–237. AAAI Press, USA (2017)