1. Attention is all you need;Vaswani;Adv. Neural Inf. Process. Syst.,2017
2. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020
3. Imagenet classification with deep convolutional neural networks;Krizhevsky;Adv. Neural Inf. Process. Syst.,2012
4. Gradient-based learning applied to document recognition;LeCun;Proc. IEEE,1998
5. Very deep convolutional networks for large-scale image recognition;Simonyan,2014