1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017). Attention is all you need. Retrieved from http://arxiv.org/abs/1706.03762
2. Lin T-Y, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Dollár P (2014) Microsoft COCO: common objects in context. Retrieved from http://arxiv.org/abs/1405.0312
3. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. Retrieved from http://arxiv.org/abs/2010.11929
4. Trockman A, Kolter JZ (2022) Patches are all you need? Retrieved from http://arxiv.org/abs/2201.09792
5. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031