A Cross-Modal Hash Retrieval Method with Fused Triples
-
Published:2023-09-21
Issue:18
Volume:13
Page:10524
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Li Wenxiao1, Mei Hongyan1, Li Yutian1, Yu Jiayao1, Zhang Xing1, Xue Xiaorong1, Wang Jiahao1
Affiliation:
1. College of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
Abstract
Due to the fast retrieval speed and low storage cost, cross-modal hashing has become the primary method for cross-modal retrieval. Since the emergence of deep cross-modal hashing methods, cross-modal retrieval significantly improved. However, the existing cross-modal hash retrieval methods still need to effectively utilize the dataset’s supervisory information and the lack of similarity expression ability. This means that the label information needs to be maximized, and the potential semantic relationship between two modalities cannot be fully explored, thus affecting the judgment of semantic similarity between two modalities. To address these problems, this paper proposes Tri-CMH, a cross-modal hash retrieval method with fused triples, which is an end-to-end modeling framework consisting of two parts: feature extraction and hash learning. Firstly, the multi-modal data are preprocessing into the form of triple groups. The data supervision matrix is constructed so that the samples with labels and their meanings are aggregated together. In contrast, the samples with labels and their opposite meanings are separated, thus avoiding the problem of the under-utilization of supervisory information in the data set and achieving the effect of efficiently utilizing the global supervisory information. Meanwhile, the loss function of the hash learning part is optimized by considering the Hamming distance loss, single-modality internal loss, cross-modality loss, and quantization loss to explicitly constrain semantically similar hash codes and semantically dissimilar hash codes and to improve the model’s ability to judge cross-modality semantic similarity. The method is trained and tested on the IAPR-TC12, MIRFLICKR-25K, and NUS-WIDE datasets, and the experimental evaluation criteria are mAP and PR curve, and the experimental results show the effectiveness and practicality of the method.
Funder
Liaoning Education Department Scientific Research Project General project of Liaoning Provincial Department of Education
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference34 articles.
1. Wang, K., Yin, Q., Wang, W., Wu, S., and Wang, L. (2016). A comprehensive survey on cross-modal retrieval. arXiv. 2. Discrete latent factor model for cross-modal hashing;Jiang;IEEE Trans. Image Process.,2019 3. Deep supervised cross-modal retrieval;Zhen;Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,2019 4. Spectral hashing;Weiss;Adv. Neural Inf. Process. Syst.,2008 5. Zhong, Z., Zheng, L., Luo, Z., Li, S., and Yang, Y. (2019, January 15–20). Invariance matters: Exemplar memory for domain adaptive person reidentification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|