Abstract
AbstractO-linked glycosylation of proteins is an essential post-translational modification process inHomo sapiens, where the attachment of a sugar moiety occurs at the oxygen atom of serine and/or threonine residues. This modification plays a pivotal role in various biological and cellular functions. While threonine or serine residues in a protein sequence are potential sites forO-linked glycosylation, not all threonine or serine residues areO-linked glycosylated. Furthermore, the modification is reversible. Hence, it is of vital importance to characterize if and whenO-linked glycosylation occurs. We propose a multi-layer perceptron-based approach termed OglyPred-PLM which leverages the contextualized embeddings produced from the ProtT5-XL-UniRef50 protein language model that significantly improves the prediction performance of humanO-linked glycosylation sites. OglyPred-PLM surpassed the performance of other indispensableO-linked glycosylation predictors on the independent benchmark dataset. This demonstrates that OglyPred-PLM is a powerful and unique computational tool to predictO-linked glycosylation sites in proteins and thus will accelerate the discovery of unknownO-linked glycosylation sites in proteins.
Publisher
Cold Spring Harbor Laboratory
Reference66 articles.
1. Yang, X.-m. in Advanced Research on Computer Education, Simulation and Modeling. (eds Song Lin & Xiong Huang ) 445–450 (Springer Berlin Heidelberg).
2. Colley, K. J. , Varki, A. & Kinoshita, T. in Essentials of Glycobiology (eds A. Varki et al. ) 41–49 (2015).
3. Adaptive immune activation: glycosylation does matter
4. The heterotaxy gene GALNT11 glycosylates Notch to orchestrate cilia type and laterality
5. Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection