EDET: Entity Descriptor Encoder of Transformer for Multi-Modal Knowledge Graph in Scene Parsing-Reference-Cited by-同舟云学术

EDET: Entity Descriptor Encoder of Transformer for Multi-Modal Knowledge Graph in Scene Parsing

Published:2023-06-14 Issue:12 Volume:13 Page:7115
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Ma Sai¹^ORCID,Wan Weibing¹^ORCID,Yu Zedong¹,Zhao Yuming²

Affiliation:

1. Department of Computer, Shanghai University of Engineering Science, Shanghai 201620, China

2. Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China

Abstract

In scene parsing, the model is required to be able to process complex multi-modal data such as images and contexts in real scenes, and discover their implicit connections from objects existing in the scene. As a storage method that contains entity information and the relationship between entities, a knowledge graph can well express objects and the semantic relationship between objects in the scene. In this paper, a new multi-phase process was proposed to solve scene parsing tasks; first, a knowledge graph was used to align the multi-modal information and then the graph-based model generates results. We also designed an experiment of feature engineering’s validation for a deep-learning model to preliminarily verify the effectiveness of this method. Hence, we proposed a knowledge representation method named Entity Descriptor Encoder of Transformer (EDET), which uses both the entity itself and its internal attributes for knowledge representation. This method can be embedded into the transformer structure to solve multi-modal scene parsing tasks. EDET can aggregate the multi-modal attributes of entities, and the results in the scene graph generation and image captioning tasks prove that EDET has excellent performance in multi-modal fields. Finally, the proposed method was applied to the industrial scene, which confirmed the viability of our method.

Funder

Science and Technology Innovation 2030—Major Project of “New Generation Artificial Intelligence”

Jiangxi Provincial Department of Science and Technology, China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/12/7115/pdf

Reference30 articles.

1. Translating embeddings for modeling multi-relational data;Bordes;Adv. Neural Inf. Process. Syst.,2013

2. Chang, D., Chen, M., Liu, C., Liu, L., Li, D., Li, W., Kong, F., Liu, B., Luo, X., and Qi, J. (2021, January 4–7). Diakg: An annotated diabetes dataset for medical knowledge graph construction. Proceedings of the Knowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction: 6th China Conference, CCKS 2021, Guangzhou, China. Proceedings 6.

3. Wang, L.L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D., Funk, K., Kinney, R., Liu, Z., and Merrill, W. (2020). Cord-19: The COVID-19 open research dataset. arXiv.

4. Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M.M. (2017, January 21–26). Geometric deep learning on graphs and manifolds using mixture model cnns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.

5. Giles, C.L., Bollacker, K.D., and Lawrence, S. (1998, January 23–26). CiteSeer: An automatic citation indexing system. Proceedings of the Third ACM Conference on Digital Libraries, Pittsburgh, PA, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Novel Object Captioning with Semantic Match from External Knowledge;Applied Sciences;2023-07-04