1. VQA: visual question answering;Antol,2015
2. Information fusion in visual question answering: asurvey;Zhang;Inf. Fusion,2019
3. A new approach to cross-modal multimedia retrieval;Rasiwasia,2010
4. Every picture tells a story: generating sentences from images;Farhadi,2010
5. The MIR Flickr retrieval evaluation;Huiskes,2008