1. Attention is all you need;Vaswani;Adv Neural Inf Process Syst,2017
2. An image is worth 16x16 words: transformers for image recognition at scale;Dosovitskiy;arXiv,2020
3. Scaling vision transformers;Zhai,2022
4. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer;Mehta;arXiv,2021
5. Swin transformer: hierarchical vision transformer using shifted windows;Liu,2021