Efficient modal-aware feature learning with application in multimodal hashing

Author:

Chu Hanlu11,Zeng Haien21,Lai Hanjiang2,Tang Yong1

Affiliation:

1. School of Computer Science, South China Normal University, Guangdong, China

2. School of Computer Science and Engineering, Sun Yat-Sen University, Guangdong, China

Abstract

Many retrieval applications can benefit from multiple modalities, for which how to represent multimodal data is the critical component. Most deep multimodal learning methods typically involve two steps to construct the joint representations: 1) learning of multiple intermediate features, with each intermediate feature corresponding to a modality, using separate and independent deep models; 2) merging the intermediate features into a joint representation using a fusion strategy. However, in the first step, these intermediate features do not have previous knowledge of each other and cannot fully exploit the information contained in the other modalities. In this paper, we present a modal-aware operation as a generic building block to capture the non-linear dependencies among the heterogeneous intermediate features, which can learn the underlying correlation structures in other multimodal data as soon as possible. The modal-aware operation consists of a kernel network and an attention network. The kernel network is utilized to learn the non-linear relationships with other modalities. The attention network finds the informative regions of these modal-aware features that are favorable for retrieval. We verify the proposed modal-aware feature learning in the multimodal hashing task. The experiments conducted on three public benchmark datasets demonstrate significant improvements in the performance of our method relative to state-of-the-art methods.

Publisher

IOS Press

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science

Reference55 articles.

1. D. Wang, P. Cui, M. Ou and W. Zhu, Deep multimodal hashing with orthogonal regularization, In Proceedings of the International Joint Conference on Artificial Intelligence, 2015.

2. Discriminative deep asymmetric supervised hashing for cross-modal retrieval;Qiang;Knowledge Based Systems,2020

3. Multimodal machine learning: A survey and taxonomy;Baltrušaitis;IEEE Transactions on Pattern Analysis and Machine Intelligence,2019

4. Image-text sentiment analysis via deep multimodal attentive fusion;Huang;Knowledge Based Systems,2019

5. A review and meta-analysis of multimodal affect detection systems;D’mello;ACM Computing Surveys,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3