Affiliation:
1. LIX, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
2. CNRS, Institut Pasteur UMR 3528, Paris 75015, France
Abstract
Abstract
Motivation
The structure of proteins is organized in a hierarchy among which the secondary structure elements, α-helix, β-strand and loop, are the basic bricks. The determination of secondary structure elements usually requires the knowledge of the whole structure. Nevertheless, in numerous experimental circumstances, the protein structure is partially known. The detection of secondary structures from these partial structures is hampered by the lack of information about connecting residues along the primary sequence.
Results
We introduce a new methodology to estimate the secondary structure elements from the values of local distances and angles between the protein atoms. Our method uses a message passing neural network, named Sequoia, which allows the automatic prediction of secondary structure elements from the values of local distances and angles between the protein atoms. This neural network takes as input the topology of the given protein graph, where the vertices are protein residues, and the edges are weighted by values of distances and pseudo-dihedral angles generalizing the backbone angles ϕ and ψ. Any pair of residues, independently of its covalent bonds along the primary sequence of the protein, is tagged with this distance and angle information. Sequoia permits the automatic detection of the secondary structure elements, with an F1-score larger than 80% for most of the cases, when α helices and β strands are predicted. In contrast to the approaches classically used in structural biology, such as DSSP, Sequoia is able to capture the variations of geometry at the interface of adjacent secondary structure element. Due to its general modeling frame, Sequoia is able to handle graphs containing only Cα atoms, which is particularly useful on low resolution structural input and in the frame of electron microscopy development.
Availability and implementation
Sequoia source code can be found at https://github.com/Khalife/Sequoia with additional documentation.
Supplementary information
Supplementary data are available at Bioinformatics Advances online.
Publisher
Oxford University Press (OUP)
Reference43 articles.
1. EMDB web resources;Abbott;Curr. Protoc. Bioinformatics,2018
2. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures;Andreeva;Nucleic Acids Res,2020
3. The Protein Data Bank and the challenge of structural genomics;Berman;Nat. Struct. Biol,2000
4. ScrewFit: combining localization and description of protein secondary structure;Calligari;Acta Crystallogr. D Biol. Crystallogr,2012
5. A group-theoretic framework for data augmentation;Chen;J. Mach. Learn. Res,2020
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献