Abstract
AbstractOver the last year, there have been substantial improvements in protein structure prediction, particularly in methods like DeepMind’s AlphaFold2 (AF2) that exploit deep learning strategies. Here we report a new CATH-Assign protocol which is used to analyse the first tranche of AF2 models predicted for 21 model organisms and discuss insights these models bring on the nature of protein structure space. We analyse good quality models and those with no unusual structural characteristics, i.e., features rarely seen in experimental structures. For the ∼370,000 models that meet these criteria, we observe that 92% can be assigned to evolutionary superfamilies in CATH. The remaining domains cluster into 2,367 putative novel superfamilies. Detailed manual analysis on a subset of 618 of those which had at least one human relative revealed some extremely remote homologies and some further unusual features, but 26 could be confirmed as novel superfamilies and one of these has an alpha-beta propeller architectural arrangement never seen before. By clustering both experimental and predicted AF2 domain structures into distinct ‘global fold’ groups, we observe that the new AF2 models in CATH increase information on structural diversity by 36%. This expansion in structural diversity will help to reveal associated functional diversity not previously detected. Our novel CATH-Assign protocol scales well and will be able to harness the huge expansion (at least 100 million models) in structural data promised by DeepMind to provide more comprehensive coverage of even the most diverse superfamilies to help rationalise evolutionary changes in their functions.
Publisher
Cold Spring Harbor Laboratory
Reference47 articles.
1. UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches;Bioinforma Oxf Engl,2015
2. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences;Acta Crystallogr Sect Struct Biol,2017
3. Gromiha MM , Nagarajan R , Selvaraj S. Protein Structural Bioinformatics: An Overview. In: Encyclopedia of Bioinformatics and Computational Biology [Internet]. Elsevier; 2019 [cited 2022 May 19]. p. 445–59. Available from: https://linkinghub.elsevier.com/retrieve/pii/B9780128096338202781
4. The relation between the divergence of sequence and structure in proteins.
5. Sen N , Anishchenko I , Bordin N , Sillitoe I , Velankar S , Baker D , et al. Characterizing disease-associated human proteins without available protein structures or homologues [Internet]. Bioinformatics; 2021 Nov [cited 2022 Jan 4]. Available from: http://biorxiv.org/lookup/doi/10.1101/2021.11.17.468998
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献