1. Vaswani A Shazeer N Parmar N Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008. Vaswani A Shazeer N Parmar N Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008.
2. Dosovitskiy A , Beyer L , Kolesnikov A , An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929 , 2020 . Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
3. Peng Z , Huang W , Gu S , Conformer : Local Features Coupling Global Representations for Visual Recognition [J]. arXiv preprint arXiv:2105.03889, 2021 . Peng Z, Huang W, Gu S, Conformer: Local Features Coupling Global Representations for Visual Recognition[J]. arXiv preprint arXiv:2105.03889, 2021.
4. Strudel R , Garcia R , Laptev I , Segmenter : Transformer for Semantic Segmentation [J]. arXiv preprint arXiv:2105.05633, 2021 . Strudel R, Garcia R, Laptev I, Segmenter: Transformer for Semantic Segmentation[J]. arXiv preprint arXiv:2105.05633, 2021.
5. Liu Z , Lin Y , Cao Y , Swin transformer: Hierarchical vision transformer using shifted windows[J]. arXiv preprint arXiv:2103.14030 , 2021 . Liu Z, Lin Y, Cao Y, Swin transformer: Hierarchical vision transformer using shifted windows[J]. arXiv preprint arXiv:2103.14030, 2021.