Abstract
AbstractA protein superfold is a type of protein fold that is observed in at least three distinct, non-homologous protein families. Structural classification studies have revealed a limited number of prevalent superfolds alongside several infrequent occurring folds, and inα/βtype superfolds, the C-terminalβ-strand tends to favor the edge of theβ-sheet, while the N-terminalβ-strand is often found in the middle. The reasons behind these observations, whether they are due to evolutionary sampling bias or physical interactions, remain unclear. This article offers a physics-based explanation for these observations, specifically for pure parallelβ-sheet topologies. Our investigation is grounded in three established structural rules that are based on physical interactions. We have identified “frustration-free topologies” which are topologies that can satisfy all three rules simultaneously. In contrast, topologies that cannot are termed “frustrated topologies.” Our findings reveal that frustration-free topologies represent only a fraction of all theoretically possible patterns, these topologies strongly favor positioning the C-terminalβ-strand at the edge of theβ-sheet and the N-terminalβ-strand in the middle, and there is significant overlap between frustration-free topologies and superfolds. We also used a lattice protein model to thoroughly investigate sequence-structure relationships. Our results show that frustration-free structures are highly designable, while frustrated structures are poorly designable. These findings suggest that superfolds are highly designable due to their lack of frustration, and the preference for positioning C-terminalβ-strands at the edge of theβ-sheet is a direct result of frustration-free topologies. These insights not only enhance our understanding of sequence-structure relationships but also have significant implications for de novo protein design.Author summaryA protein superfold is a protein fold that appears in at least three different non-homologous protein families. Superfolds are unique in their ability to accommodate multiple functions within a single fold, a feature not typically seen in other folds. Studies in structural classification have led to two notable observations: the existence of a limited number of common superfolds contrasted with a larger variety of less frequent folds, and a recurring pattern inα/βtype superfolds where the C-terminalβ-strand often occupies the edge of theβ-sheet, while the N-terminalβ-strand is usually found in the middle. The origins of these patterns, whether they stem from evolutionary sampling bias or physical interaction mechanisms, remain unclear. This article provides a physics-oriented explanation for these observations, specifically concentrating on pure parallelβ-sheet topologies. The insights gained from this research are crucial in enhancing our understanding of the relationship between protein sequences and structures, and are expected to contribute significantly to the de novo design of new proteins.
Publisher
Cold Spring Harbor Laboratory