1. A convnet for the 2020s;Liu,2022
2. Convnext v2: co-designing and scaling convnets with masked autoencoders;Woo,2023
3. An image is worth 16×16 words: transformers for image recognition at scale;Dosovitskiy,2021
4. Swin transformer: hierarchical vision transformer using shifted windows;Liu,2021
5. Video transformers: a survey;Selva;IEEE Trans. Pattern Anal. Mach. Intell.,2023