Author:
Ma Mingyang,Tohti Turdi,Hamdulla Askar
Funder
National Natural Science Foundation of China
Natural Science Foundation of Xinjiang, China
Strengthening Plan of National Defense Science and Technology Foundation of China
Publisher
Springer Science and Business Media LLC
Reference44 articles.
1. Wang, P., Wu, Q., Shen, C., Hengel, A., Dick, A.: Explicit knowledge-based reasoning for visual question answering (2015). arXiv preprint arXiv:1511.02570
2. Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: OK-VQA: a visual question answering benchmark requiring external knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3195–3204 (2019)
3. Wang, P., Wu, Q., Shen, C., Dick, A., Van Den Hengel, A.: FVQA: fact-based visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2413–2427 (2017)
4. Lu, J., Batra, D., Parikh, D., Lee, S.: Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
5. Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers (2019). arXiv preprint arXiv:1908.07490