Author:
Wu Ruidong,Ding Fan,Wang Rui,Shen Rui,Zhang Xiwen,Luo Shitong,Su Chenpeng,Wu Zuofan,Xie Qi,Berger Bonnie,Ma Jianzhu,Peng Jian
Abstract
AbstractRecent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins or fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein’s folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-encountered gap in structure prediction and brings us a step closer to understanding protein folding in nature.
Publisher
Cold Spring Harbor Laboratory
Reference97 articles.
1. Highly accurate protein structure prediction with AlphaFold
2. Accurate prediction of protein structures and interactions using a three-track neural network
3. A. Vaswani , N. Shazeer , N. Parmar , J. Uszkoreit , L. Jones , A. N. Gomez , Ł. U. Kaiser , I. Polosukhin , in Advances in Neural Information Processing Systems, I. Guyon , U. V. Luxburg , S. Bengio , H. Wallach , R. Fergus , S. Vishwanathan , R. Garnett , Eds. (Curran Associates, Inc., 2017; https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf), vol. 30.
4. Hidden Markov model speed heuristic and iterative HMM search procedure
5. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
Cited by
238 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献