Cross-modal retrieval based on multi-dimensional feature fusion hashing-Reference-Cited by-同舟云学术

Cross-modal retrieval based on multi-dimensional feature fusion hashing

Published:2024-06-19 Issue: Volume:12 Page:
ISSN:2296-424X
Container-title:Frontiers in Physics
language:
Short-container-title:Front. Phys.

Author:

Ren Dongxiao,Xu Weihua

Abstract

Along with the continuous breakthrough and popularization of information network technology, multi-modal data, including texts, images, videos, and audio, is growing rapidly. We can retrieve different modal data to meet our needs, so cross-modal retrieval has important theoretical significance and application value. In addition, because the data of different modalities can be mutually retrieved by mapping them to a unified Hamming space, hash codes have been extensively used in the cross-modal retrieval field. However, existing cross-modal hashing models generate hash codes based on single-dimension data features, ignoring the semantic correlation between data features in different dimensions. Therefore, an innovative cross-modal retrieval method using Multi-Dimensional Feature Fusion Hashing (MDFFH) is proposed. To better get the image’s multi-dimensional semantic features, a convolutional neural network, and Vision Transformer are combined to construct an image multi-dimensional fusion module. Similarly, we apply the multi-dimensional text fusion module to the text modality to obtain the text’s multi-dimensional semantic features. These two modules can effectively integrate the semantic features of data in different dimensions through feature fusion, making the generated hash code more representative and semantic. Extensive experiments and corresponding analysis results on two datasets indicate that MDFFH’s performance outdoes other baseline models.

Publisher

Frontiers Media SA

Reference52 articles.

1. Developing ChatGPT for biology and medicine: a complete review of biomedical question answering;Li;Biophys Rep,2024

2. Enhancing stock price prediction with deep cross-modal information fusion network;Mandal;Fluctuation Noise Lett,2024

3. Dark-side avoidance of mobile applications with data biases elimination in socio-cyber world;Ma;IEEE Trans Comput Soc Syst,2023

4. Adaptive marginalized semantic hashing for unpaired cross-modal retrieval;Luo;IEEE Trans Multimedia,2022