Affiliation:
1. University of Science and Technology of China
2. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Abstract
On many popular social websites, images are usually associated with some meta-data such as textual tags, which involve semantic information relevant to the image and can be used to supervise the representation learning for image retrieval. However, these user-provided tags are usually polluted by noise, therefore the main challenge lies in mining the potential useful information from those noisy tags. Many previous works simply treat different tags equally to generate supervision, which will inevitably distract the network learning. To this end, we propose a new framework, termed as Weakly Supervised Hashing with Reconstructive Cross-modal Attention (WSHRCA), to learn compact visual-semantic representation with more reliable supervision for retrieval task. Specifically, for each image-tag pair, the weak supervision from tags is refined by cross-modal attention, which takes image feature as query to aggregate the most content-relevant tags. Therefore, tags with relevant content will be more prominent while noisy tags will be suppressed, which provides more accurate supervisory information. To improve the effectiveness of hash learning, the image embedding in WSHRCA is reconstructed from hash code, which is further optimized by cross-modal constraint and explicitly improves hash learning. The experiments on two widely-used datasets demonstrate the effectiveness of our proposed method for weakly-supervised image retrieval. The code is available at
https://github.com/duyc168/weakly-supervised-hashing
.
Funder
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
GPU cluster built by MCC Lab of Information Science and Technology Institution
Supercomputing Center of the USTC
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Reference50 articles.
1. Additive Quantization for Extreme Vector Compression
2. Deep Cauchy Hashing for Hamming Space Retrieval
3. Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Deep visual-semantic quantization for efficient image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1328–1337.
4. Deep Quantization Network for Efficient Image Retrieval
5. Approximate asymmetric search for binary embedding codes;Chiu Chih-Yi;ACM Transactions on Multimedia Computing, Communications, and Applications,2016
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献