1. BERT: Pre-training of deep bidirectional transformers for language understanding;devlin;Proc Conf North Amer Chapter Assoc Comput Linguistics Hum Lang Technol,2019
2. Attention is all you need;vaswani;Proc Adv Neural Inf Process Syst,2017
3. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
4. An image is worth 16×16 words: Transformers for image recognition at scale;dosovitskiy;Proc Int Conf Learn Represent,2020
5. EfficientNet: Rethinking model scaling for convolutional neural networks;tan;Proc Int Conf Mach Learn,2019