1. Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)
2. Beyer, L., Hénaff, O.J., Kolesnikov, A., Zhai, X., Oord, A.V.D.: Are we done with imagenet? arXiv preprint arXiv:2006.07159 (2020)
3. Brendel, W., Bethge, M.: Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet. In: ICLR (2019)
4. Brown, T., et al.: Language models are few-shot learners. In: NeurIPS (2020)
5. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)