1. Graph-structured representations for visual question answering;Teney,2017
2. Multi-modal graph neural network for joint reasoning on vision and scene text;Gao,2020
3. Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering;Zhu,2020
4. Graphvqa: language-guided graph neural networks for scene graph question answering;Liang;NAACL-HLT 2021,2021
5. Visual question answering with attention transfer and a cross-modal gating mechanism;Li;Pattern Recognit. Lett.,2020