1. Attention is all you need;Vaswani;Advances in Neural Information Processing Systems,2017
2. An image is worth 16x16 words: Trans-formers for image recognition at scale;Dosovitskiy,2021
3. Training data-efficient image transformers distillation through attention;Touvron,2021
4. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
5. Binaryconnect: Training deep neural networks with binary weights during propagations;Courbariaux;Advances in Neural Information Processing Systems,2015