1. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020
2. Imagenet: A large-scale hierarchical image database;Deng,2009
3. Training data-efficient image transformers & distillation through attention;Touvron,2021
4. TinyMIM: An empirical study of distilling MIM pre-trained models;Ren,2023
5. Vitkd: Practical guidelines for vit feature knowledge distillation;Yang,2022