Scalable Multi-grained Cross-modal Similarity Query with Interpretability-Reference-Cited by-同舟云学术

Scalable Multi-grained Cross-modal Similarity Query with Interpretability

Published:2021-05-31 Issue:3 Volume:6 Page:280-293
ISSN:2364-1185
Container-title:Data Science and Engineering
language:en
Short-container-title:Data Sci. Eng.

Author:

Zhu Mingdong^ORCID,Shen Derong,Xu Lixin,Wang Xianfang

Abstract

AbstractCross-modal similarity query has become a highlighted research topic for managing multimodal datasets such as images and texts. Existing researches generally focus on query accuracy by designing complex deep neural network models and hardly consider query efficiency and interpretability simultaneously, which are vital properties of cross-modal semantic query processing system on large-scale datasets. In this work, we investigate multi-grained common semantic embedding representations of images and texts and integrate interpretable query index into the deep neural network by developing a novel Multi-grained Cross-modal Query with Interpretability (MCQI) framework. The main contributions are as follows: (1) By integrating coarse-grained and fine-grained semantic learning models, a multi-grained cross-modal query processing architecture is proposed to ensure the adaptability and generality of query processing. (2) In order to capture the latent semantic relation between images and texts, the framework combines LSTM and attention mode, which enhances query accuracy for the cross-modal query and constructs the foundation for interpretable query processing. (3) Index structure and corresponding nearest neighbor query algorithm are proposed to boost the efficiency of interpretable queries. (4) A distributed query algorithm is proposed to improve the scalability of our framework. Comparing with state-of-the-art methods on widely used cross-modal datasets, the experimental results show the effectiveness of our MCQI approach.

Funder

National Natural Science Foundation of China

Training Plan of Young Backbone Teachers in Universities of Henan Province

Publisher

Springer Science and Business Media LLC

Subject

Computer Science Applications,Computational Mechanics

Link

https://link.springer.com/content/pdf/10.1007/s41019-021-00162-4.pdf

Reference40 articles.

1. Peng Y, Huang X, Zhao Y (2018) An over view of cross-media retrieval: Concepts, methodologies, benchmarks and challenges. IEEE Trans Circuits Syst Video Technol 28(9):2372–2385

2. He X, Peng Y, Xi L (2019) A new benchmark and approach for fine-grained cross-media retrieval. In: 27th ACM international conference on multimedia, ACM. pp 1740–1748