Author:
Jahandideh Samad,Jaroszewski Lukasz,Godzik Adam
Abstract
Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely usedXtalPredalgorithm [Slabinskiet al.(2007),Protein Sci.16, 2472–2482] was developed.XtalPredclassifies proteins into five `crystallization classes' based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The newXtalPred-RF(random forest) achieves significant improvement of the prediction of crystallization success over the originalXtalPred. To illustrate this,XtalPred-RFwas tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e.twofold) for the top class of predicted targets.
Publisher
International Union of Crystallography (IUCr)
Subject
General Medicine,Structural Biology
Cited by
48 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献