Author:
Mishra Soumya,Looger Loren L.,Porter Lauren L.
Abstract
AbstractExtant fold-switching proteins remodel their secondary structures and change their functions in response to cellular stimuli, regulating biological processes and affecting human health. In spite of their biological importance, these proteins remain understudied. Few representative examples of fold switchers are available in the Protein Data Bank, and they are difficult to predict. In fact, all 96 experimentally validated examples of extant fold switchers were stumbled upon by chance. Thus, predictive methods are needed to expedite the process of discovering and characterizing more of these shapeshifting proteins. Previous approaches require a solved structure or all-atom simulations, greatly constraining their use. Here, we propose a high-throughput sequence-based method for predicting extant fold switchers that transition from α-helix in one conformation to β-strand in the other. This method leverages two previous observations: (1) α-helix <-> β-strand prediction discrepancies from JPred4 are a robust predictor of fold switching, and (2) the fold-switching regions (FSRs) of some extant fold switchers have different secondary structure propensities when expressed in isolation (isolated FSRs) than when expressed within the context of their parent protein (contextualized FSRs). Combining these two observations, we ran JPred4 on the sequences of isolated and contextualized FSRs from 14 known extant fold switchers and found α-helix <->β-strand prediction discrepancies in every case. To test the overall robustness of this finding, we randomly selected regions of proteins not expected to switch folds (single-fold proteins) and found significantly fewer α-helix <-> β-strand prediction discrepancies (p < 4.2*10−20, Kolmogorov-Smirnov test). Combining these discrepancies with the overall percentage of predicted secondary structure, we developed a classifier that often robustly identifies extant fold switchers (Matthews Correlation Coefficient of 0.70). Although this classifier had a high false negative rate (6/14), its false positive rate was very low (1/211), suggesting that it can be used to predict a subset of extant fold switchers from billions of available genomic sequences.
Publisher
Cold Spring Harbor Laboratory