1. Training data-efficient image transformers & distillation through attention;touvron;ICML,2021
2. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;ICLRE,2021
3. Masked autoencoders that listen;huang;CoRR,2022
4. A ConvNet for the 2020s
5. Attention is all you need;vaswani;NIPS,2017