Affiliation:
1. Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology , Barcelona 08003, Spain
2. Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF) , Barcelona 08003, Spain
Abstract
Abstract
Motivation
Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis.
Results
Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis.
Availability and implementation
All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred).
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
Centre for Genomic Regulation
Spanish Plan Nacional
Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa’
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献