Abstract
AbstractProtein engineering techniques are key in designing novel catalysts for a wide range of reactions. Although approaches vary in their exploration of the sequence-structure-function paradigm, they are often hampered by the labor-intensive steps of protein expression and screening. In this work, we describe the development and testing of a high throughputin silicosequence-structure-function pipeline using AlphaFold2 and Fast Fourier Transform docking that is benchmarked with enantioselectivity and reactivity predictions for an ancestral sequence library of fungal flavin-dependent monooxygenases. The predicted enantioselectivities and reactivities correlate well with previously described screens of an experimentally available subset of these proteins and capture known changes in enantioselectivity across the phylogenetic tree representing ancestorial proteins from this family. With this pipeline established as our functional screen, we apply ensemble decision tree models and explainable AI techniques to build sequence function models and extract critical residues within the binding site and the second sphere residues around this site. We demonstrate that the top-identified key residues in the control of enantioselectivity and reactivity correspond to experimentally verified residues. Thein silicosequence-to-function pipeline serves as an accelerated framework to inform protein engineering efforts.
Publisher
Cold Spring Harbor Laboratory