SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data-Reference-Cited by-同舟云学术

SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data

Published:2023-10-11 Issue:4 Volume:5 Page:
ISSN:2631-9268
Container-title:NAR Genomics and Bioinformatics
language:en
Short-container-title:

Author:

Xiao Di¹,Lin Michael²,Liu Chunlei¹,Geddes Thomas A¹³⁴,Burchfield James G³⁴,Parker Benjamin L⁵^ORCID,Humphrey Sean J³⁴⁶^ORCID,Yang Pengyi¹²³^ORCID

Affiliation:

1. Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney , Westmead , NSW 2145 , Australia

2. School of Mathematics and Statistics, The University of Sydney , Sydney , NSW 2006 , Australia

3. Charles Perkins Centre, The University of Sydney , Sydney , NSW 2006 , Australia

4. School of Environmental and Life Sciences, The University of Sydney , Sydney , NSW 2006 , Australia

5. Centre for Muscle Research, Department of Anatomy and Physiology, School of Biomedical Sciences , Melbourne , VIC 3010 , Australia

6. Murdoch Children’s Research Institute, The Royal Children’s Hospital , Melbourne, VIC, 3052, Australia

Abstract

Abstract A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.

Publisher

Oxford University Press (OUP)

Subject

Applied Mathematics,Computer Science Applications,Genetics,Molecular Biology,Structural Biology

Link

https://academic.oup.com/nargab/article-pdf/5/4/lqad099/52946137/lqad099.pdf

Reference57 articles.

1. Protein phosphorylation: a major switch mechanism for metabolic regulation;Humphrey;Trends Endocrinol. Metab.,2015

2. Multi-omic profiling reveals dynamics of the phased progression of pluripotency;Yang;Cell Syst.,2019

3. CDK substrate phosphorylation and ordering the cell cycle;Swaffer;Cell,2016

4. Phosphoproteomics of primary AML patient samples reveals rationale for AKT combination therapy and p53 context to overcome selinexor resistance;Emdal;Cell Rep.,2022

5. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites;Blom;J. Mol. Biol.,1999