Author:
Tang Jinle,Zhang Zhe,Zhan Jian,Zhou Yaoqi
Abstract
ABSTRACTHigh-resolution protein structure determination by experimental techniques is notoriously costly and labor intensive. This problem is mostly solved with arrival of deep-learning-based computational prediction by AlphaFold2 but only for those proteins with enough naturally occurring homologous sequences. Here, we attempt to close the remaining gap by employing artificially generated, structure-stability-selected homologous sequences as an input for AlphaFold2. We showed that only one round of selection of deeply mutated sequences of a few mutations is sufficient to bring the accuracy of predicted structures to better than 2 Å RMSD from their respective native structures for four of the five proteins experimented. The performance for three out of five proteins is even better than AlphaFold2 with naturally occurring sequences. The only protein with predicted structure of >2 Å (at 2.92 Å) RMSD is due to a fully exposed (i.e., likely flexible) β-hairpin. The result supports a future of determining protein structures at low cost and fast turnaround by integrating simple molecular biology experiments (deep mutational scanning andin vivoorin vitroselection) with high-throughput sequencing. The technique proposed here can be further extended to predict structures of protein complexes as well as proteins with posttranslational modifications.
Publisher
Cold Spring Harbor Laboratory