Abstract
AbstractRapid advancements in protein structure prediction methods have ushered in a new era of abundant and accurate structural data, providing opportunities to analyse proteins at a scale that has not been possible before. Here we show that features derived solely from predicted structures can be used to understandin vivoprotein behaviour using data-driven methods. We found that these features were predictive ofin vivoprotein production for a set of designed antibodies, enabling identification of high-quality designs. Following on from this result, we calculated these features for a diverse set of ≈500,000 predicted structures, and our analysis showed systematic variation between proteins from different organisms to such an extent that the tree of life could be recapitulated from these data. Given the high degree of functional constraint around the chemistry of proteins, this result is surprising, and could have important implications for the design and engineering of novel proteins.
Publisher
Cold Spring Harbor Laboratory