Scene graph semantic inference for image and text matching-Reference-Cited by-同舟云学术

Scene graph semantic inference for image and text matching

Published:2022-09-14 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Pei Jiaming¹,Zhong Kaiyang²,Yu Zhi³,Wang Lukun⁴,Lakshmanna Kuruva⁵

Affiliation:

1. School of Computer Science, University of Sydney, Australia

2. School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, China

3. School of Microelectronics and Communication Engineering, Chongqing University, China

4. (Corresponding author) College of Intelligent equipment, Shandong University of Science and Technology, China

5. School of Information Technology and Engineering, Vellore Institute of Technology, India

Abstract

With the rapid development of information technology, image and text data have increased dramatically. Image and text matching techniques enable computers to understand information from both visual and text modalities and match them based on semantic content. Existing methods focus on visual and textual object co-occurrence statistics and learning coarse-level associations. However, the lack of intramodal semantic inference leads to the failure of fine-level association between modalities. Scene graphs can capture the interactions between visual and textual objects and model intramodal semantic associations, which are crucial for the understanding of scenes contained in images and text. In this paper, we propose a novel scene graph semantic inference network (SGSIN) for image and text matching that effectively learns fine-level semantic information in vision and text to facilitate bridging cross-modal discrepancies. Specifically, we design two matching modules and construct scene graphs within each matching module for aggregating neighborhood information to refine the semantic representation of each object and achieve fine-level alignment of visual and textual modalities. We perform extended experiments in Flickr30k and MSCOCO and achieve state-of-the-art results, which validate the advantages of our proposed approach.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3563390

Reference77 articles.

1. Taghreed Abdullah and Lalitha Rangarajan. 2021. Image-Text Matching: Methods and Challenges. Inventive Systems and Control(2021) 213–222. Taghreed Abdullah and Lalitha Rangarajan. 2021. Image-Text Matching: Methods and Challenges. Inventive Systems and Control(2021) 213–222.

2. SPICE: Semantic Propositional Image Caption Evaluation

3. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

4. A Multimodal Deep Framework for Derogatory Social Media Post Identification of a Recognized Person

5. Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Incorporating bidirectional feature pyramid network and lightweight network: a YOLOv5-GBC distracted driving behavior detection model;Neural Computing and Applications;2023-10-05

2. Chinese Text De-Colloquialization Technique Based on Back-Translation Strategy and End-to-End Learning;Applied Sciences;2023-09-29

3. Neutron transport calculation for the BEAVRS core based on the LSTM neural network;Scientific Reports;2023-09-06

4. Research Constituents and Trends in Smart Farming: An Analytical Retrospection from the Lens of Text Mining;Journal of Sensors;2023-08-11

5. ECBTNet: English-Foreign Chinese intelligent translation via multi-subspace attention and hyperbolic tangent LSTM;Neural Computing and Applications;2023-06-18