DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins-Reference-Cited by-同舟云学术

DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins

Published:2019-11-18 Issue:7 Volume:36 Page:2105-2112
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Zhang Chengxin¹^ORCID,Zheng Wei¹^ORCID,Mortuza S M¹,Li Yang¹²,Zhang Yang¹³

Affiliation:

1. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA

2. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

3. Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA

Abstract

Abstract Motivation The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. Results We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. Availability and implementation https://zhanglab.ccmb.med.umich.edu/DeepMSA/. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Institutes of Health

National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz863/30960243/btz863.pdf

Reference42 articles.

1. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks;Adhikari;Bioinformatics,2018

2. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs;Altschul;Nucleic Acids Res,1997

3. Improved protein contact predictions with the MetaPSICOV2 server in CASP12;Buchan;Proteins,2018

4. FFPred 3: feature-based function prediction for all Gene Ontology domains;Cozzetto;Sci. Rep,2016

5. Profile hidden Markov models;Eddy;Bioinformatics,1998

Cited by 157 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Topology and functional characterization of major outer membrane proteins of Treponema maltophilum and Treponema lecithinolyticum;Molecular Oral Microbiology;2024-09-12

2. A privacy-preserving approach for cloud-based protein fold recognition;Patterns;2024-09

3. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis;Computers in Biology and Medicine;2024-09

4. African Swine Fever Virus Protein–Protein Interaction Prediction;Viruses;2024-07-20

5. MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training;2024-06-12