1. Sahu T (2022) Visual question answering with multimodal transformers. https://medium.com/data-science-at-microsoft/visual-question-answering-with-multimodal-transformers-d4f57950c867
2. Ben Abacha A, Hasan SA, Datla VV, Liu J, Demner-Fushman D, Müller H (2019) Vqa-med: overview of the medical visual question answering task at imageclef 2019. In: Working Notes of CLEF 2019. CEUR Workshop Proceedings, vol. 2380. CEUR-WS.org, Lugano, Switzerland. https://ceur-ws.org/Vol-2380/paper_272.pdf
3. Lau JJ, Gayen S, Demner D, Ben Abacha A (2019) Visual question answering in radiology (VQA-RAD). OSF. https://doi.org/10.17605/OSF.IO/89KPS
4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2023) Attention is all you need
5. Agrawal A, Lu J, Antol S, Mitchell M, Zitnick CL, Batra D, Parikh D (2016) VQA: visual question answering