Abstract
AbstractGenerating knowledge graph embeddings (KGEs) to represent entities (nodes) and relations (edges) in large scale knowledge graph datasets has been a challenging problem in representation learning. This is primarily because the embeddings / vector representations that are required to encode the full scope of data in a large heterogeneous graph needs to have a high dimensionality. The orientation of a large number of vectors requires a lot of space which is achieved by projecting the embeddings to higher dimensions. This is not a scalable solution especially when we expect the knowledge graph to grow in size in order to incorporate more data. Any efforts to constrain the embeddings to lower number of dimensions could be problematic as insufficient space to spatially orient the large number of embeddings / vector representations within limited number of dimensions could lead to poor inferencing on downstream tasks such as link prediction which leverage these embeddings to predict the likelihood of existence of a link between two or more entities in a knowledge graph. This is especially the case with large biomedical knowledge graphs which relate several diverse entities such as genes, diseases, signaling pathways, biological functions etc. that are clinically relevant for the application of KGs to drug discovery. The size of the biomedical knowledge graphs are therefore much larger compared to typical benchmark knowledge graph datasets. This poses a huge challenge in generating embeddings / vector representations of good quality to represent the latent semantic structure of the graph. Attempts to circumvent this challenge by increasing the dimensionality of the embeddings often render hardware limitations as generating high dimensional embeddings is computationally expensive and often times infeasible. To practically deal with representing the latent structure of such large scale knowledge graphs (KGs), our work proposes an ensemble learning model in which the full knowledge graph is sampled into several smaller subgraphs and KGE models generate embeddings for each individual subgraph. The results of link prediction from the KGE models trained on each subgraph are then aggregated to generate a consolidated set of link predictions across the full knowledge graph. The experimental results demonstrated significant improvement in rank-based evaluation metrics on task specific link predictions as well as general link predictions on four open-sourced biomedical knowledge graph datasets.
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. Knowledge graph embedding for link prediction: A comparative analysis;ACM Transactions on Knowledge Discovery from Data (TKDD),2021
2. Matteo Palmonari and Pasquale Minervini . Knowledge graph embeddings and explainable ai. Knowledge Graphs for Explainable Artificial Intelligence: Foundations , Applications and Challenges, 47:49, 2020.
3. Ink: knowledge graph embeddings for node classification;Data Mining and Knowledge Discovery,2022
4. Shreyansh Bhatt , Swati Padhee , Amit Sheth , Keke Chen , Valerie Shalin , Derek Doran , and Brandon Minnery . Knowledge graph enhanced community detection and characterization. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 51–59, 2019.
5. Yuanfei Dai , Shiping Wang , Neal N Xiong , and Wenzhong Guo . A survey on knowledge graph embedding: Approaches, applications and benchmarks. Electronics, 9(5):750, 2020.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献