1. Acón D, Wu L (2018) Multimodal imaging in diabetic macular edema. The Asia-Pacific Journal of Ophthalmology 7(1):22–27
2. Carrington A, Manuel D, Fieguth P, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, McInnes M, Magwood O, Sheikh Y, Holzinger A (2023) Deep roc analysis and auc as balanced average accuracy for improved classifier selection, audit and explanation. IEEE Trans Pattern Anal Mach Intell 45(1):329–341. https://doi.org/10.1109/TPAMI.2022.3145392
3. Chen C-F, Fan Q, Panda R (2021) Crossvit: Cross-attention multi-scale vision transformer for image classification. arXiv preprint arXiv:2103.14899
4. Chen C-F, Panda R, Fan Q (2021) Regionvit: Regional-to-local attention for vision transformers . arXiv:2106.02689
5. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Jakob U, Neil H (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929