Semantic Completion and Filtration for Image–Text Retrieval

Author:

Yang Song1ORCID,Li Qiang2ORCID,Li Wenhui2ORCID,Li Xuan-Ya3ORCID,Jin Ran4ORCID,Lv Bo5ORCID,Wang Rui6ORCID,Liu Anan1ORCID

Affiliation:

1. Tianjin University; China and also with the Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China

2. Tianjin University, China

3. Baidu Inc., Beijing, China

4. Zhejiang Wanli University, Ningbo, China

5. The 30th Research Institute of China Electronics Technology Group Corporation, ChengDu, China

6. The 30th Research Institute of China Electronics Technology Group Corporation, China

Abstract

Image–text retrieval is a vital task in computer vision and has received growing attention, since it connects cross-modality data. It comes with the critical challenges of learning unified representations and eliminating the large gap between visual and textual domains. Over the past few decades, although many works have made significant progress in image–text retrieval, they are still confronted with the challenge of incomplete text descriptions of images, i.e., how to fully learn the correlations between relevant region–word pairs with semantic diversity. In this article, we propose a novel semantic completion and filtration (SCAF) method to alleviate the above issue. Specifically, the text semantic completion module is presented to generate a complete semantic description of an image using multi-view text descriptions, guiding the model to explore the correlations of relevant region–word pairs fully. Meanwhile, the adaptive structural semantic matching module is presented to filter irrelevant region–word pairs by considering the relevance score of each region–word pair, which facilitates the model to focus on learning the relevance of matching pairs. Extensive experiments show that our SCAF outperforms the existing methods on Flickr30K and MSCOCO datasets, which demonstrates the superiority of our proposed method.

Funder

National Natural Science Foundation of China

China Postdoctoral Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Reference54 articles.

1. VQA: Visual Question Answering

2. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment;Barzilay Regina;CoRR,2003

3. Ali Furkan Biten, Lluís Gómez, Marçal Rusiñol, and Dimosthenis Karatzas. 2019. Good news, everyone! Context driven entity-aware captioning for news images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12466–12475.

4. Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, and Jungong Han. 2020. IMRAM: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12652–12660.

5. Empirical evaluation of gated recurrent neural networks on sequence modeling;Chung Junyoung;CoRR,2014

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

2. A method for image–text matching based on semantic filtering and adaptive adjustment;EURASIP Journal on Image and Video Processing;2024-08-29

3. Multi-view and region reasoning semantic enhancement for image-text retrieval;Multimedia Systems;2024-06-15

4. Object search by a concept-conditioned object detector;Neural Computing and Applications;2024-05-20

5. Universal Relocalizer for Weakly Supervised Referring Expression Grounding;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-05-16

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3