1. An image is worth 16×16 words: Transformers for image recognition at scale;dosovitskiy;arXiv preprint arXiv 2010 11419,2020
2. Attention is all you need;vaswani;Advances in neural information processing systems,2017
3. Beit: Bert pre-training of image transformers;bao;arXiv preprint arXiv 2106 01111,2021
4. Generative pretraining from pixels;chen;International Conference on Machine Learning,2020
5. U-net: Convolutional networks for biomedical image segmentation;ronneberger;International Conference on Medical Image Computing and Computer-Assisted Intervention,2015