Exploring the Effectiveness of Binary-Valued and Real-Valued Representations for Cross-Modal Retrieval

Author:

Bhatt Nikita1,Bhatt Nirav1,Prajapati Purvi1

Affiliation:

1. CSPIT, CHARUSAT

Abstract

AbstractCross-modal retrieval(CMR) refers to the task of retrieving semantically related items across different modalities. For example, given an image query, the task is to retrieve relevant text descriptions or audio clips. One of the major challenges in CMR is the modality gap, which refers to the differences between the features and representations used to encode information in different modalities. To address the modality gap, researchers have developed various techniques such as joint embedding, where the features from different modalities are mapped to a common embedding space where they can be compared directly. Binary-valued and real-valued representations are two different ways to represent data. A binary-valued representation is a type of discrete representation where data is represented using either 0 or 1. Real-valued representation, on the other hand, represents each item as a vector of real numbers. Both types of representations have their advantages and disadvantages, and researchers continue to explore new techniques for generating representations that can improve the performance of CMR systems. First time, the work presented here generates both the representations and comparison is made by performing experiments on standard benchmark datasets using mean average precision (MAP). The result suggest that real-valued representation outperforms binary-valued representation in terms of MAP, especially when the data is complex and high-dimensional. On the other hand, binary codes are more memory-efficient than real-valued embedding, and they can be computed much faster. Moreover, binary codes can be easily stored and transmitted, making them more suitable for large-scale retrieval tasks.

Publisher

Research Square Platform LLC

Reference33 articles.

1. Su, Shupeng, Zhisheng Zhong, and Chao Zhang. "Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval." Proceedings of the IEEE/CVF international conference on computer vision. 2019.

2. Shao, Jie, et al. "3view deep canonical correlation analysis for cross-modal retrieval." 2015 Visual Communications and Image Processing (VCIP). IEEE, 2015.

3. "Heterogeneous community question answering via social-aware multi-modal co-attention convolutional matching;Hu Jun;IEEE Transactions on Multimedia,2020

4. Feng, Fangxiang, Xiaojie Wang, and Ruifan Li. "Cross-modal retrieval with correspondence autoencoder." Proceedings of the 22nd ACM international conference on Multimedia. 2014.

5. "Discriminative dictionary learning with common label alignment for cross-modal retrieval;Deng Cheng;IEEE Transactions on Multimedia,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3