Impact of Binary-Valued Representation on the Performance of Cross-Modal Retrieval System-Reference-Cited by-同舟云学术

Impact of Binary-Valued Representation on the Performance of Cross-Modal Retrieval System

Published:2022-12-01 Issue:6 Volume:7 Page:964-981
ISSN:2455-7749
Container-title:International Journal of Mathematical, Engineering and Management Sciences
language:en
Short-container-title:Int. j. math. eng. manag. sci.

Author:

Bhatt Nikita¹,Ganatra Amit²,Bhatt Nirav³,Prajapati Purvi³,Rahevar Mrugendra¹,Parmar Martin¹

Affiliation:

1. U & P U. Patel Department of Computer Engineering, CSPIT, CHARUSAT, Gujarat, India.

2. Devang Patel Institute of Advance Technology and Research, CHARUSAT, Gujarat, India.

3. Smt. Kundanben Dinsha Patel Department of Information Technology, CSPIT, CHARUSAT, Gujarat, India.

Abstract

The tremendous proliferation of Multi-Modal data and the flexible need of users has drawn attention to the field of Cross-Modal Retrieval (CMR), which can perform image-sketch matching, text-image matching, audio-video matching and near infrared-visual image matching. Such retrieval is useful in many applications like criminal investigation, recommendation systems and person reidentification. The real challenge in CMR is to preserve semantic similarities between various modalities of data. To preserve semantic similarities, existing deep learning-based approaches use pairwise labels and generate binary-valued representation. The generated binary-valued representation provides fast retrieval with low storage requirement. However, the relative similarity between heterogeneous data is ignored. So, the objective of this work is to reduce the modality-gap by preserving relative semantic similarities among various modalities. So, a model named "Deep Cross-Modal Retrieval (DCMR)" is proposed, which takes triplet labels as the input and generates binary-valued representation. The triplet labels locate semantic similar data points nearer and dissimilar points far in the vector space. Extensive experiments are performed and the result is compared with deep learning-based approaches, which shows that the performance of DCMR increases by 2% to 3% for Image→Text retrieval and by 2% to 5% for Text→Image retrieval in mean average precision (mAP) on MSCOCO, XMedia, and NUS-WIDE datasets. So, the binary-valued representation generated from triplet labels preserve better relative semantic similarities than pairwise labels.

Publisher

Ram Arti Publishers

Subject

General Engineering,General Business, Management and Accounting,General Mathematics,General Computer Science

Reference43 articles.

1. Bhatt, N., & Ganatra, A. (2021). Improvement of deep cross-modal retrieval by generating real-valued representation. PeerJ Computer Science, 7, e491. https://doi.org/10.7717/peerj-cs.491.

2. Brodeur, S., Perez, E., Anand, A., Golemo, F., Celotti, L., Strub, F., Rouat, J., Larochelle, H., & Courville, A. (2017). Home: A household multimodal environment. arXiv preprint arXiv:1711.11017. https://doi.org/10.48550/arXiv.1711.11017.

3. Cao, Y., Long, M., Wang, J., Yang, Q., & Yu, P.S. (2016). Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1445-1454). https://doi.org/10.1145/2939672.2939812.

4. Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., & Chateau, T. (2017). Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2040-2049).

5. Ding, G., Guo, Y., & Zhou, J. (2014). Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2075-2082).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sustainable hybrid energy system’s reliability optimization by solving RRAP-CM with integration of metaheuristic approaches;Management of Environmental Quality: An International Journal;2024-09-03