A Comparison of Mutual Information, Linear Models and Deep Learning Networks for Protein Secondary Structure Prediction-Reference-Cited by-同舟云学术

A Comparison of Mutual Information, Linear Models and Deep Learning Networks for Protein Secondary Structure Prediction

Published:2023-10 Issue:8 Volume:18 Page:631-646
ISSN:1574-8936
Container-title:Current Bioinformatics
language:en
Short-container-title:CBIO

Author:

Mahmoud Saida Saad Mohamed¹²^ORCID,Portelli Beatrice¹^ORCID,D'Agostino Giovanni¹^ORCID,Pollastri Gianluca³^ORCID,Serra Giuseppe¹^ORCID,Fogolari Federico¹^ORCID

Affiliation:

1. Department of Mathematics, Computer Science and Physics, University of Udine, Udine, Italy

2. Faculty of Science, Cairo University, Cairo, Egypt

3. School of Computer Science, University College of Dublin, Dublin, Ireland

Abstract

Background: Over the last several decades, predicting protein structures from amino acid sequences has been a core task in bioinformatics. Nowadays, the most successful methods employ multiple sequence alignments and can predict the structure with excellent performance. These predictions take advantage of all the amino acids at a given position and their frequencies. However, the effect of single amino acid substitutions in a specific protein tends to be hidden by the alignment profile. For this reason, single-sequence-based predictions attract interest even after accurate multiple-alignment methods have become available: the use of single sequences ensures that the effects of substitution are not confounded by homologous sequences. Objective: This work aims at understanding how the single-sequence secondary structure prediction of a residue is influenced by the surrounding ones. We aim at understanding how different prediction methods use single-sequence information to predict the structure. Methods: We compare mutual information, the coefficients of two linear models, and three deep learning networks. For the deep learning algorithms, we use the DeepLIFT analysis to assess the effect of each residue at each position in the prediction. Result: Mutual information and linear models quantify direct effects, whereas DeepLIFT applied on deep learning networks quantifies both direct and indirect effects. Conclusion: Our analysis shows how different network architectures use the information of single protein sequences and highlights their differences with respect to linear models. In particular, the deep learning implementations take into account context and single position information differently, with the best results obtained using the BERT architecture.

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

Reference44 articles.

1. Anfinsen C.B.; Principles that govern the folding of protein chains. Science 1973,181(4096),223-230

2. Rost B.; Sander C.; Schneider R.; Redefining the goals of protein secondary structure prediction. J Mol Biol 1994,235(1),13-26

3. Jumper J.; Evans R.; Pritzel A.; Highly accurate protein structure prediction with AlphaFold. Natur 2021,596(7873),583-589

4. Zhou Y.; Karplus M.; Interpreting the folding kinetics of helical proteins. Natur 1999,401(6751),400-403

5. Ozkan S.B.; Wu G.A.; Chodera J.D.; Dill K.A.; Protein folding by zipping and assembly. Proc Natl Acad Sci USA 2007,104(29),11987-11992