Weakly Supervised Hashing with Reconstructive Cross-modal Attention-Reference-Cited by-同舟云学术

Weakly Supervised Hashing with Reconstructive Cross-modal Attention

Published:2023-07-12 Issue:6 Volume:19 Page:1-19
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Du Yongchao¹^ORCID,Wang Min²^ORCID,Lu Zhenbo²^ORCID,Zhou Wengang¹^ORCID,Li Houqiang¹^ORCID

Affiliation:

1. University of Science and Technology of China

2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

Abstract

On many popular social websites, images are usually associated with some meta-data such as textual tags, which involve semantic information relevant to the image and can be used to supervise the representation learning for image retrieval. However, these user-provided tags are usually polluted by noise, therefore the main challenge lies in mining the potential useful information from those noisy tags. Many previous works simply treat different tags equally to generate supervision, which will inevitably distract the network learning. To this end, we propose a new framework, termed as Weakly Supervised Hashing with Reconstructive Cross-modal Attention (WSHRCA), to learn compact visual-semantic representation with more reliable supervision for retrieval task. Specifically, for each image-tag pair, the weak supervision from tags is refined by cross-modal attention, which takes image feature as query to aggregate the most content-relevant tags. Therefore, tags with relevant content will be more prominent while noisy tags will be suppressed, which provides more accurate supervisory information. To improve the effectiveness of hash learning, the image embedding in WSHRCA is reconstructed from hash code, which is further optimized by cross-modal constraint and explicitly improves hash learning. The experiments on two widely-used datasets demonstrate the effectiveness of our proposed method for weakly-supervised image retrieval. The code is available at https://github.com/duyc168/weakly-supervised-hashing .

Funder

National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities

GPU cluster built by MCC Lab of Information Science and Technology Institution

Supercomputing Center of the USTC

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3589185

Reference50 articles.

1. Additive Quantization for Extreme Vector Compression

2. Deep Cauchy Hashing for Hamming Space Retrieval

3. Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Deep visual-semantic quantization for efficient image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1328–1337.

4. Deep Quantization Network for Efficient Image Retrieval

5. Approximate asymmetric search for binary embedding codes;Chiu Chih-Yi;ACM Transactions on Multimedia Computing, Communications, and Applications,2016

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Supervised Semantic-Embedded Hashing for Multimedia Retrieval;Knowledge-Based Systems;2024-09