1. mixup: Beyond empirical risk minimization;hongyi;ICLRE,2018
2. Cutmix: Regu-larization strategy to train strong classifiers with localizable features;yun;ICCV,2019
3. Tokens-to-token vit: Training vision transformers from scratch on imagenet;li;ArXiv Preprint,2021
4. Multi-scale context aggregation by dilated convolutions;yu;ArXiv Preprint,2015
5. Deepvit: Towards deeper vision transformer;zhou;ar Xiv preprint,2021