Affiliation:
1. Agricultural Research Service, Crop Improvement and Genetics Research Unit U.S. Department of Agriculture Albany CA United States
2. Agricultural Research Service, Corn Insects and Crop Genetics Research U.S. Department of Agriculture Ames IA United States
3. Department of Computer Science Iowa State University Ames IA United States
4. Department of Bioengineering University of California Berkeley CA United States
Abstract
AbstractProtein phosphorylation is a dynamic and reversible post‐translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein–protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine‐learning approach that leverages protein language models and gradient‐boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision‐recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision‐recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence‐based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome‐wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.
Funder
Agricultural Research Service
Subject
Plant Science,Biochemistry, Genetics and Molecular Biology (miscellaneous),Ecology,Ecology, Evolution, Behavior and Systematics
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献