MM-Transformer: A Transformer-Based Knowledge Graph Link Prediction Model That Fuses Multimodal Features
Author:
Wang Dongsheng1ORCID, Tang Kangjie1, Zeng Jun1, Pan Yue1, Dai Yun2, Li Huige1, Han Bin1
Affiliation:
1. School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China 2. Department of Information Management, Jiangsu Justice Police Vocational College, Nanjing 211805, China
Abstract
Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities and the symmetry in the modal fusion process. To address this issue, this paper proposed a Transformer-based knowledge graph link prediction model (MM-Transformer) that fuses multimodal features. Different modal encoders are employed to extract structural, visual, and textual features, and symmetrical hybrid key-value calculations are performed on features from different modalities based on the Transformer architecture. The similarities of textual tags to structural tags and visual tags are calculated and aggregated, respectively, and multimodal entity representations are modeled and optimized to reduce the heterogeneity of the representations. The experimental results show that compared with the current multimodal SOTA method, MKGformer, MM-Transformer improves the Hits@1 and Hits@10 evaluation indicators by 1.17% and 1.39%, respectively, proving that the proposed method can effectively solve the problem of multimodal feature fusion in the knowledge graph link prediction task.
Funder
National Natural Science Foundation of China Open Fund for Innovative Research on Ship Overall Performance
Reference34 articles.
1. Huang, X., Zhang, J., Li, D., and Li, P. (2019, January 11–15). Knowledge graph embedding based question answering. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia. 2. Yih, S.W., Chang, M.W., He, X., and Gao, J. (2015, January 26). Semantic parsing via staged query graph generation: Question answering with knowledge base. Proceedings of the Joint Conference of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the AFNLP, Beijing, China. 3. Zhou, H., Young, T., Huang, M., Zhao, H., Xu, J., and Zhu, X. (2018, January 13–19). Commonsense knowledge aware conversation generation with graph attention. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI, Stockholm, Sweden. 4. Huang, J., Zhao, W.X., Dou, H., Wen, J.R., and Chang, E.Y. (2018, January 8–12). Improving sequential recommendation with knowledge-enhanced memory networks. Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, New York, NY, USA. 5. Zhang, N., Jia, Q., Deng, S., Chen, X., Ye, H., Chen, H., Tou, H., Huang, G., Wang, Z., and Hua, N. (2021, January 14–18). Alicg: Fine-grained and evolvable conceptual graph construction for semantic search at alibaba. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore.
|
|