Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm-Reference-Cited by-同舟云学术

Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm

Published:2022-07-25 Issue:15 Volume:10 Page:2585
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Li Guokun,Wang Zhen^ORCID,Xu Shibo,Feng Chuang,Yang Xiaohan,Wu Nannan,Sun Fuzhen

Abstract

The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. However, inconsistent distribution and diverse representation make it hard to directly measure the similarity relationship between different modal samples, which causes a heterogeneity gap. To bridge the above-mentioned gap, we propose the deep adversarial learning triplet similarity preserving cross-modal retrieval algorithm to map different modal samples into the common space, allowing their feature representation to preserve both the original inter- and intra-modal semantic similarity relationship. During the training process, we employ GANs, which has advantages in modeling data distribution and learning discriminative representation, in order to learn different modal features. As a result, it can align different modal feature distributions. Generally, many cross-modal retrieval algorithms only preserve the inter-modal similarity relationship, which makes the nearest neighbor retrieval results vulnerable to noise. In contrast, we establish the triplet similarity preserving function to simultaneously preserve the inter- and intra-modal similarity relationship in the common space and in each modal space, respectively. Thus, the proposed algorithm has a strong robustness to noise. In each modal space, to ensure that the generated features have the same semantic information as the sample labels, we establish a linear classifier and require that the generated features’ classification results be consistent with the sample labels. We conducted cross-modal retrieval comparative experiments on two widely used benchmark datasets—Pascal Sentence and Wikipedia. For the image to text task, our proposed method improved the mAP values by 1% and 0.7% on the Pascal sentence and Wikipedia datasets, respectively. Correspondingly, the proposed method separately improved the mAP values of the text to image performance by 0.6% and 0.8% on the Pascal sentence and Wikipedia datasets, respectively. The experimental results show that the proposed algorithm is better than the other state-of-the-art methods.

Funder

National Natural Science Foundation of China

the Natural Science Foundation of Shandong Province of China

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/10/15/2585/pdf

Reference31 articles.

1. Image-text bidirectional learning network based cross-modal retrieval

2. DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval

3. Cross-Model Hashing Retrieval Based on Deep Residual Network

4. Cross-Modal Contrastive Learning for Text-to-Image Generation;Zhang;arXiv,2021

5. Deep Learning Triplet Ordinal Relation Preserving Binary Code for Remote Sensing Image Retrieval Task