Abstract
AbstractProtein sequence design in the context of small molecules, nucleotides, and metals is critical to enzyme and small molecule binder and sensor design, but current state-of-the-art deep learning-based sequence design methods are unable to model non-protein atoms and molecules. Here, we describe a deep learning-based protein sequence design method called LigandMPNN that explicitly models all non-protein components of biomolecular systems. LigandMPNN significantly outperforms Rosetta and ProteinMPNN on native backbone sequence recovery for residues interacting with small molecules (63.3% vs. 50.4% & 50.5%), nucleotides (50.5% vs. 35.2% & 34.0%), and metals (77.5% vs. 36.0% & 40.6%). LigandMPNN generates not only sequences but also sidechain conformations to allow detailed evaluation of binding interactions. Experimental characterization demonstrates that LigandMPNN can generate small molecule and DNA-binding proteins with high affinity and specificity.One-sentence summaryWe present a deep learning-based protein sequence design method that allows explicit modeling of small molecule, nucleotide, metal, and other atomic contexts.
Publisher
Cold Spring Harbor Laboratory
Reference19 articles.
1. Scientific Benchmarks for Guiding Macromolecular Energy Function Improvement
2. Robust deep learning–based protein sequence design using ProteinMPNN
3. Hsu, C. , Verkuil, R. , Liu, J. , Lin, Z. , Hie, B. , Sercu, T. , … & Rives, A. (2022). Learning inverse folding from millions of predicted structures. bioRxiv.
4. Ingraham, J. , Garg, V. , Barzilay, R. , & Jaakkola, T. (2019). Generative models for graph-based protein design. Advances in Neural Information Processing Systems, 32.
5. ProDCoNN: Protein design using a convolutional neural network. Proteins: Structure;Function, and Bioinformatics,2020