1. I.
Turc
, M.-W.Chang, K.Lee, and K.Toutanova, “Well-read students learn better: On the importance of pre-training compact models,” arXiv:1908.08962v2 (2019).
2. An image is worth 16 × 16 words: Transformers for image recognition at scale
3. CvT: Introducing convolutions to vision transformers,2021
4. An empirical study of training self-supervised vision transformers,2021
5. Emerging properties in self-supervised vision transformers,2021