1. Abacha, A.B., Hasan, S.A., Datla, V.V., Liu, J., Demner-Fushman, D., Müller, H., 2019. VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019. In: CLEF (Working Notes), Vol. 2.
2. Multimodal biomedical AI;Acosta;Nat. Med.,2022
3. Bao, H., Dong, L., Piao, S., Wei, F., 2021. BEiT: BERT Pre-Training of Image Transformers. In: International Conference on Learning Representations.
4. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts;Bao;Adv. Neural Inf. Process. Syst.,2022
5. Beltagy, I., Lo, K., Cohan, A., 2019. SciBERT: A Pretrained Language Model for Scientific Text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. EMNLP-IJCNLP, pp. 3615–3620.