Improving the chances of successful protein structure determination with a random forest classifier-Reference-Cited by-同舟云学术

Improving the chances of successful protein structure determination with a random forest classifier

Published:2014-02-15 Issue:3 Volume:70 Page:627-635
ISSN:1399-0047
Container-title:Acta Crystallographica Section D Biological Crystallography
language:
Short-container-title:Acta Cryst D Biol Crystallogr

Author:

Jahandideh Samad,Jaroszewski Lukasz,Godzik Adam

Abstract

Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely usedXtalPredalgorithm [Slabinskiet al.(2007),Protein Sci.16, 2472–2482] was developed.XtalPredclassifies proteins into five `crystallization classes' based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The newXtalPred-RF(random forest) achieves significant improvement of the prediction of crystallization success over the originalXtalPred. To illustrate this,XtalPred-RFwas tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e.twofold) for the top class of predicted targets.

Publisher

International Union of Crystallography (IUCr)

Subject

General Medicine,Structural Biology

Link

http://journals.iucr.org/d/issues/2014/03/00/wd5222/wd5222.pdf

Reference48 articles.

1. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

2. Predicting protein crystallization propensity from protein sequence

3. The Protein Data Bank

4. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.

Cited by 48 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Nicotiana tabacum UGT89A2 enzyme catalyzes the glycosylation of di- and trihydroxylated benzoic acid derivatives;Phytochemistry;2024-10

2. Risk assessment of bridge construction investigated using random forest algorithm;Scientific Reports;2024-09-09

3. Artificial Intelligence Assisted Pharmaceutical Crystallization;Crystal Growth & Design;2024-05-03

4. Machine learning in crystallography and structural science;Acta Crystallographica Section A Foundations and Advances;2024-01-26

5. Deep learning applications in protein crystallography;Acta Crystallographica Section A Foundations and Advances;2024-01-01