Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding

Author:

Salcedo Mariah V.1,Gravel Nathan2,Keshavarzi Abbas3,Huang Liang-Chin2,Kochut Krzysztof J.3,Kannan Natarajan12

Affiliation:

1. Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America

2. Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America

3. School of Computing, University of Georgia, Athens, GA, United States of America

Abstract

The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.

Funder

National Institutes of Health

Publisher

PeerJ

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Reference106 articles.

1. Boxe: a box embedding model for knowledge base completion;Abboud;Advances in Neural Information Processing Systems,2020

2. Nuclear CDKs drive Smad transcriptional activation and turnover in BMP and TGF-beta pathways;Alarcón;Cell,2009

3. Application and evaluation of knowledge graph embeddings in biomedical data;Alshahrani;PeerJ Computer Science,2021

4. Network sampling using k-hop random walks for heterogeneous network embedding;Anil,2019

5. The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review);Ardito;International Journal of Molecular Medicine,2017

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Heterogeneous network approaches to protein pathway prediction;Computational and Structural Biotechnology Journal;2024-12

2. Informatic challenges and advances in illuminating the druggable proteome;Drug Discovery Today;2024-03

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3