Learning the local landscape of protein structures with convolutional neural networks


Kulikova Anastasiya V.ORCID,Diaz Daniel J.ORCID,Loy James M.ORCID,Ellington Andrew D.ORCID,Wilke Claus O.ORCID


AbstractThe fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding a site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate, and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.


Cold Spring Harbor Laboratory

Reference39 articles.

1. Abadi, M. , Agarwal, A. , Barham, P. , E., B., Chen, Z. , Citro, C. , Corrado, G.S. , Davis, A. , Dean, J. , Devin, M. , Ghemawat, S. , Goodfellow, I. , Harp, A. , Irving, G. , Isard, M. , Jozefowicz, R. , Jia, Y. , Kaiser, L. , Kudlur, M. , Levenberg, J. , Mané, D. , Schuster, M. , Monga, R. , Moore, S. , Murray, D. , Olah, C. , Shlens, J. , Steiner, B. , Sutskever, I. , Talwar, K. , Tucker, P. , Vanhoucke, V. , Vasudevan, V. , Viégas, F. , Vinyals, O. , Warden, P. , Wattenberg, M. , Wicke, M. , Yu, Y. , Zheng, X. : Tensorflow: Large-scale machine learning on heterogeneous systems (2015). Software available from:https://www.tensorflow.org/

2. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server;BMC Bioinf,2016

3. Controlling the false discovery rate: A practical and powerful approach to multiple testing;J R Stat Soc Series B Stat Methodol J R STAT SOC B,1995

4. Bisardi, M. , Rodriguez-Rivas, J. , Zamponi, F. , Weigt, M. : Modeling sequence-space exploration and emergence of epistatic signals in protein evolution (2021)

5. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3