1. Vaswani A. Shazeer N. Parmar N. et al.:Attention is all you need. In:Advances in Neural Information Processing Systems pp.5998–6008(2017)
2. Dosovitskiy A. Beyer L. Kolesnikov A. et al.:an image is worth 16×16 words: Transformers for image recognition at scale. In:International Conference on Learning Representations (ICLR) Vienna Austria(2021)
3. Carion N. Massa F. Synnaeve G. Usunier N. Kirillov A. Zagoruyko S.:End‐to‐end object detection with Transformers. In:Computer Vision – ECCV 2020. pp.213–229 Springer Cham(2020)
4. Chen M. Radford A. Child R. Wu J. Jun H. Luan D. Sutskever I.:Generative pretraining from pixels. In:International Conference on Machine Learning pp.1691–1703 Online(2020)
5. Han K. Wang Y. Chen H. et al.:A Survey on visual transformer. arXiv.2012.12556 (2022)