Modality-Dependent Cross-Media Retrieval-Reference-Cited by-同舟云学术

Modality-Dependent Cross-Media Retrieval

Published:2016-07-14 Issue:4 Volume:7 Page:1-13
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Wei Yunchao¹,Zhao Yao¹,Zhu Zhenfeng¹,Wei Shikui¹,Xiao Yanhui¹,Feng Jiashi²,Yan Shuicheng²

Affiliation:

1. Beijing Jiaotong University, Beijing, China

2. National University of Singapore, Singapore

Abstract

In this article, we investigate the cross-media retrieval between images and text, that is, using image to search text (I2T) and using text to search images (T2I). Existing cross-media retrieval methods usually learn one couple of projections, by which the original features of images and text can be projected into a common latent space to measure the content similarity. However, using the same projections for the two different retrieval tasks (I2T and T2I) may lead to a tradeoff between their respective performances, rather than their best performances. Different from previous works, we propose a modality-dependent cross-media retrieval (MDCR) model, where two couples of projections are learned for different cross-media retrieval tasks instead of one couple of projections. Specifically, by jointly optimizing the correlation between images and text and the linear regression from one modal space (image or text) to the semantic space, two couples of mappings are learned to project images and text from their original feature spaces into two common latent subspaces (one for I2T and the other for T2I). Extensive experiments show the superiority of the proposed MDCR compared with other methods. In particular, based on the 4,096-dimensional convolutional neural network (CNN) visual feature and 100-dimensional Latent Dirichlet Allocation (LDA) textual feature, the mAP of the proposed method achieves the mAP score of 41.5%, which is a new state-of-the-art performance on the Wikipedia dataset.

Funder

National Basic Research Program of China

Fundamental Scientific Research Project

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2775109

Reference27 articles.

1. A multi-view embedding space for modeling internet images, tags, and their semantics;Gong Y.;International Journal of Computer Vision,2013

2. Canonical Correlation Analysis: An Overview with Application to Learning Methods

Cited by 73 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accurate multi-view clustering to seek the cross-viewed yet uniform sample assignment via tensor feature matching;Information Sciences;2024-04

2. Unsupervised Dual Hashing Coding (UDC) on Semantic Tagging and Sample Content for Cross-Modal Retrieval;IEEE Transactions on Multimedia;2024

3. Cross-Media Retrieval Based on Two-Level Similarity and Collaborative Representation;Traitement du Signal;2023-10-30

4. ONION: Online Semantic Autoencoder Hashing for Cross-Modal Retrieval;ACM Transactions on Intelligent Systems and Technology;2023-02-16

5. Bagging-based cross-media retrieval algorithm;Soft Computing;2022-11-14