Cross-modal semantic autoencoder with embedding consensus-Reference-Cited by-同舟云学术

Cross-modal semantic autoencoder with embedding consensus

Published:2021-10-13 Issue:1 Volume:11 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Sun Shengzi,Guo Binghui,Mi Zhilong,Zheng Zhiming

Abstract

AbstractCross-modal retrieval has become a topic of popularity, since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention. Traditional single-modal methods reconstruct the original information and lack of considering the semantic similarity between different data. In this work, a cross-modal semantic autoencoder with embedding consensus (CSAEC) is proposed, mapping the original data to a low-dimensional shared space to retain semantic information. Considering the similarity between the modalities, an automatic encoder is utilized to associate the feature projection to the semantic code vector. In addition, regularization and sparse constraints are applied to low-dimensional matrices to balance reconstruction errors. The high dimensional data is transformed into semantic code vector. Different models are constrained by parameters to achieve denoising. The experiments on four multi-modal data sets show that the query results are improved and effective cross-modal retrieval is achieved. Further, CSAEC can also be applied to fields related to computer and network such as deep and subspace learning. The model breaks through the obstacles in traditional methods, using deep learning methods innovatively to convert multi-modal data into abstract expression, which can get better accuracy and achieve better results in recognition.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-021-92750-7.pdf

Reference32 articles.

1. Nie, L., Zhao, Y.-L., Akbari, M., Shen, J. & Chua, T.-S. Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans. Knowl. Data Eng. 27, 396–409 (2015).

2. Jacobs, D. W., Daume, H., Kumar, A. & Sharma, A. Generalized multiview analysis: A discriminative latent space. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012).

3. Masci, J., Bronstein, M. M., Bronstein, A. M. & Schmidhuber, J. Multimodal similarity-preserving hashing. IEEE Trans. Pattern Anal. Mach. Intell. 36, 824–830 (2014).

4. Zhen, Y. & Yeung, D. Y. Co-regularized hashing for multimodal data. In International Conference on Neural Information Processing Systems (2012).

5. Weston, J., Bengio, S. & Usunier, N. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence (2011).