A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE-Reference-Cited by-同舟云学术

A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE

Published:2022-01-10 Issue: Volume:2022 Page:1-12
ISSN:1875-919X
Container-title:Scientific Programming
language:en
Short-container-title:Scientific Programming

Author:

Yin Xiuye¹^ORCID,Chen Liyong²

Affiliation:

1. School of Computer Science and Technology, Zhoukou Normal University, Henan, Zhoukou 466001, China

2. School of Network Engineering, Zhoukou Normal University, Henan, Zhoukou 466001, China

Abstract

In view of the complexity of the multimodal environment and the existing shallow network structure that cannot achieve high-precision image and text retrieval, a cross-modal image and text retrieval method combining efficient feature extraction and interactive learning convolutional autoencoder (CAE) is proposed. First, the residual network convolution kernel is improved by incorporating two-dimensional principal component analysis (2DPCA) to extract image features and extracting text features through long short-term memory (LSTM) and word vectors to efficiently extract graphic features. Then, based on interactive learning CAE, cross-modal retrieval of images and text is realized. Among them, the image and text features are respectively input to the two input terminals of the dual-modal CAE, and the image-text relationship model is obtained through the interactive learning of the middle layer to realize the image-text retrieval. Finally, based on Flickr30K, MSCOCO, and Pascal VOC 2007 datasets, the proposed method is experimentally demonstrated. The results show that the proposed method can complete accurate image retrieval and text retrieval. Moreover, the mean average precision (MAP) has reached more than 0.3, the area of precision-recall rate (PR) curves are better than other comparison methods, and they are applicable.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Computer Science Applications,Software

Link

http://downloads.hindawi.com/journals/sp/2022/7314599.pdf

Reference36 articles.

1. Global and local semantics-preserving based deep hashing for cross-modal retrieval

2. Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal Retrieval

3. Vehicle Instance Segmentation From Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network