KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases-Reference-Cited by-同舟云学术

KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases

Published:2023-10-06 Issue: Volume:11 Page:e16164
ISSN:2167-8359
Container-title:PeerJ
language:en
Short-container-title:

Author:

Anandakrishnan Manju¹,Ross Karen E.²,Chen Chuming¹,Shanker Vijay¹,Cowart Julie¹,Wu Cathy H.¹²

Affiliation:

1. Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America

2. Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America

Abstract

Background Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. Methods KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder’s generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 “dark” kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. Results KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8–0.9, and two at 0.7–0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Conclusions KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.

Funder

National Institute of General Medical Sciences

National Cancer Institute of the National Institutes of Health

The National Science Foundation

Publisher

PeerJ

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Link

https://peerj.com/articles/16164.pdf

Reference54 articles.

1. Structure and dynamics of inactive and active MARK4: conformational switching through the activation process;Ahrari;Journal of Biomolecular Structure and Dynamics,2020

2. The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review);Ardito;International Journal of Molecular Medicine,2017

3. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium;Ashburner;Nature Genetics,2000

4. Embedding entities and relations for learning and inference in knowledge bases;Bishan,2014

5. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis;Blohm;Nucleic Acids Research,2014