1. Ali, A., et al.: XCiT: cross-covariance image transformers. In: NeurIPS, vol. 34 (2021)
2. Bello, I.: Lambdanetworks: modeling long-range interactions without attention. arXiv preprint arXiv:2102.08602 (2021)
3. Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., Douze, M.: Multigrain: a unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509 (2019)
4. Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. arXiv Computer Vision and Pattern Recognition (2021)
5. Chen, C.F., Fan, Q., Panda, R.: CrossViT: cross-attention multi-scale vision transformer for image classification. In: ICCV (2021)