Similarity Reasoning and Filtration for Image-Text Matching-Reference-Cited by-同舟云学术

Similarity Reasoning and Filtration for Image-Text Matching

Published:2021-05-18 Issue:2 Volume:35 Page:1218-1226
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Diao Haiwen,Zhang Ying,Ma Lin,Lu Huchuan

Abstract

Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words. However, how to make the most of these alignments to infer more accurate matching scores is still underexplored. In this paper, we propose a novel Similarity Graph Reasoning and Attention Filtration (SGRAF) network for image-text matching. Specifically, the vector-based similarity representations are firstly learned to characterize the local and global alignments in a more comprehensive manner, and then the Similarity Graph Reasoning (SGR) module relying on one graph convolutional neural network is introduced to infer relation-aware similarities with both the local and global alignments. The Similarity Attention Filtration (SAF) module is further developed to integrate these alignments effectively by selectively attending on the significant and representative alignments and meanwhile casting aside the interferences of non-meaningful alignments. We demonstrate the superiority of the proposed method with achieving state-of-the-art performances on the Flickr30K and MSCOCO datasets, and the good interpretability of SGR and SAF with extensive qualitative experiments and analyses.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 146 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GADNet: Improving image–text matching via graph-based aggregation and disentanglement;Pattern Recognition;2025-01

2. Cross-modality interaction reasoning for enhancing vision-language pre-training in image-text retrieval;Applied Intelligence;2024-09-11

3. A method for image–text matching based on semantic filtering and adaptive adjustment;EURASIP Journal on Image and Video Processing;2024-08-29

4. Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching;ACM Transactions on Information Systems;2024-08-19

5. SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-08-16