Affiliation:
1. School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, P. R. China
Abstract
Multimodal entity linking aims to link mentions to target entities in the multimodal knowledge graph. The current multimodal entity linking mainly focuses on the global fusion of text and image, seldom fully exploring the correlation between modalities. In order to improve the fusion effect of multimodal feature, we propose a multimodal entity linking model based on a Multimodal Co-Attention Fusion strategy. This strategy is designed to enable text and image to guide each other for extracting features, thus making full exploration of the correlation between modalities to improve the fine-grained feature fusion effect. Furthermore, we also design a candidate entity generation strategy based on Transformer, which combines multiple candidate entity sets and adjusts the candidate entity ranking to obtain high-quality candidate entity sets. We perform experiments on domain datasets and public datasets, and the experimental results demonstrate that our model has a good performance in candidate entity generation and multimodal feature fusion, outperforming the state-of-the-art baseline models.
Funder
Hebei Natural Science Foundation
Science and Technology Project of Hebei Education Department
Publisher
World Scientific Pub Co Pte Ltd