Abstract
AbstractSUMOylation is a reversible post-translational protein modification in which SUMOs (small ubiquitin-like modifiers) covalently attach to a specific lysine residue of the target protein. This process is vital for many cellular events. Aberrant SUMOylation is associated with several diseases, including Alzheimer’s, cancer, and diabetes. Therefore, accurate identification of SUMOylation sites is essential to understanding cellular processes and pathologies that arise with their disruption. We present three deep neural architectures, SUMOnets, that take the peptide sequence centered on the candidate SUMOylation site as input and predict whether the lysine could be SUMOylated. Each of these models, SUMOnet-1, -2, and -3 relies on different compositions of deep sequential learning architectural units, such as bidirectional Gated Recurrent Units(biGRUs) and convolutional layers. We evaluate these models on the benchmark dataset with three different input peptide representations of the input sequence. SUMOnet-3 achieves 75.8% AUPR and 87% AUC scores, corresponding to approximately 5% improvement over the closest state-of-the-art SUMOylation predictor and 16% improvement over GPS-SUMO, the most widely adopted tool. We also evaluate models on a challenging subset of the test data formed based on the absence and presence of known SUMOylation motifs. Even though the performances of all methods degrade in these cases, SUMOnet-3 remains the best predictor in these challenging cases.Availability and ImplementationThe SUMOnet-3 framework is available as an open-source project and a Python library athttps://github.com/berkedilekoglu/SUMOnet.
Publisher
Cold Spring Harbor Laboratory