1. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, et al. VQA: Visual question answering. In: Proceedings of the IEEE international conference on computer vision. 2015, p. 2425–33.
2. Learning transferable visual models from natural language supervision;Radford,2021
3. VisualBERT: A simple and performant baseline for vision and language;Li,2019
4. RSVQA: Visual question answering for remote sensing data;Lobry;IEEE Trans Geosci Remote Sens,2020
5. Lobry S, Demir B, Tuia D. RSVQA Meets BigEarthNet: A New, Large-Scale, Visual Question Answering Dataset for Remote Sensing. In: IEEE international symposium on geoscience and remote sensing. 2021, p. 1218–21.