Author:
Liu Zhihong,Wang Jianji,Chen Hui,Ma Yongqiang,Zheng Nanning
Funder
National Natural Science Foundation of China
Reference42 articles.
1. Exploring visual relationship for image captioning;Yao,2018
2. Auto-encoding scene graphs for image captioning;Yang,2019
3. Spatial-temporal graphs for cross-modal Text2Video retrieval;Song;IEEE Trans. Multimed.,2022
4. D. Teney, L. Liu, A. van Den Hengel, Graph-structured representations for visual question answering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1–9.
5. Mucko: Multi-layer cross-modal knowledge reasoning for fact-based visual question answering;Zhu,2020