Author:
Hartley Sophia M.,Tiernan Kelly A.,Ahmetaj Gjina,Cretu Adriana,Zhuang Yan,Zimmer Marc
Abstract
AlphaFold2 and RoseTTAfold are able to predict, based solely on their sequence whether GFP-like proteins will post-translationally form a chromophore (the part of the protein responsible for fluorescence) or not. Their training has not only taught them protein structure and folding, but also chemistry. The structures of 21 sequences of GFP-like fluorescent proteins that will post-translationally form a chromophore and of 23 GFP-like non-fluorescent proteins that do not have the residues required to form a chromophore were determined by AlphaFold2 and RoseTTAfold. The resultant structures were mined for a series of geometric measurements that are crucial to chromophore formation. Statistical analysis of these measurements showed that both programs conclusively distinguished between chromophore forming and non-chromophore forming proteins. A clear distinction between sequences capable of forming a chromophore and those that do not have the residues required for chromophore formation can be obtained by examining a single measurement—the RMSD of the overlap of the central alpha helices of the crystal structure of S65T GFP and the AlphaFold2 determined structure. Only 10 of the 578 GFP-like proteins in the pdb have no chromophore, yet when AlphaFold2 and RoseTTAFold are presented with the sequences of 44 GFP-like proteins that are not in the pdb they fold the proteins in such a way that one can unequivocally distinguish between those that can and cannot form a chromophore.
Publisher
Public Library of Science (PLoS)
Reference62 articles.
1. Highly accurate protein structure prediction with AlphaFold;J Jumper;Nature,2021
2. Highly accurate protein structure prediction for the human proteome;K Tunyasuvunakool;Nature,2021
3. Accurate prediction of protein structures and interactions using a three-track neural network;M Baek;Science,2021
4. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy;SK Burley;Nucleic Acids Research,2018
5. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences;SK Burley;Nucleic Acids Research,2021
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献