Exploring the Effectiveness of Binary-Valued and Real-Valued Representations for Cross-Modal Retrieval-Reference-Cited by-同舟云学术

Exploring the Effectiveness of Binary-Valued and Real-Valued Representations for Cross-Modal Retrieval

Published:2023-03-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Bhatt Nikita¹,Bhatt Nirav¹,Prajapati Purvi¹

Affiliation:

1. CSPIT, CHARUSAT

Abstract

AbstractCross-modal retrieval(CMR) refers to the task of retrieving semantically related items across different modalities. For example, given an image query, the task is to retrieve relevant text descriptions or audio clips. One of the major challenges in CMR is the modality gap, which refers to the differences between the features and representations used to encode information in different modalities. To address the modality gap, researchers have developed various techniques such as joint embedding, where the features from different modalities are mapped to a common embedding space where they can be compared directly. Binary-valued and real-valued representations are two different ways to represent data. A binary-valued representation is a type of discrete representation where data is represented using either 0 or 1. Real-valued representation, on the other hand, represents each item as a vector of real numbers. Both types of representations have their advantages and disadvantages, and researchers continue to explore new techniques for generating representations that can improve the performance of CMR systems. First time, the work presented here generates both the representations and comparison is made by performing experiments on standard benchmark datasets using mean average precision (MAP). The result suggest that real-valued representation outperforms binary-valued representation in terms of MAP, especially when the data is complex and high-dimensional. On the other hand, binary codes are more memory-efficient than real-valued embedding, and they can be computed much faster. Moreover, binary codes can be easily stored and transmitted, making them more suitable for large-scale retrieval tasks.

Publisher

Research Square Platform LLC

Reference33 articles.

1. Su, Shupeng, Zhisheng Zhong, and Chao Zhang. "Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval." Proceedings of the IEEE/CVF international conference on computer vision. 2019.

2. Shao, Jie, et al. "3view deep canonical correlation analysis for cross-modal retrieval." 2015 Visual Communications and Image Processing (VCIP). IEEE, 2015.

3. "Heterogeneous community question answering via social-aware multi-modal co-attention convolutional matching;Hu Jun;IEEE Transactions on Multimedia,2020

4. Feng, Fangxiang, Xiaojie Wang, and Ruifan Li. "Cross-modal retrieval with correspondence autoencoder." Proceedings of the 22nd ACM international conference on Multimedia. 2014.

5. "Discriminative dictionary learning with common label alignment for cross-modal retrieval;Deng Cheng;IEEE Transactions on Multimedia,2015