Rigorous assessment and integration of the sequence and structure based features to predict hot spots-Reference-Cited by-同舟云学术

Rigorous assessment and integration of the sequence and structure based features to predict hot spots

Published:2011-07-29 Issue:1 Volume:12 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Chen Ruoying,Chen Wenjing,Yang Sixiao,Wu Di,Wang Yong,Tian Yingjie,Shi Yong

Abstract

AbstractBackgroundSystematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.ResultsIn this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.ConclusionExperimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-12-311.pdf

Reference93 articles.

1. Elsasser S, Chandler-Militello D, Muller B, Hanna J, Finley D: Rad23 and Rpn10 serve as alternative ubiquitin receptors for the proteasome. J Biol Chem 2004, 279(26):26817–26822. 10.1074/jbc.M404020200

2. Komenda J, Reisinger V, Muller BC, Dobakova M, Granvogl B, Eichacker LA: Accumulation of the D2 protein is a key regulatory step for assembly of the photosystem II reaction center complex in Synechocystis PCC 6803. J Biol Chem 2004, 279(47):48620–48629. 10.1074/jbc.M405725200

3. Lightfoote MM, Coligan JE, Folks TM, Fauci AS, Martin MA, Venkatesan S: Structural characterization of reverse transcriptase and endonuclease polypeptides of the acquired immunodeficiency syndrome retrovirus. J Virol 1986, 60(2):771–775.

4. Schaller A, Martin F, Muller B: Characterization of the calf thymus hairpin-binding factor involved in histone pre-mRNA 3' end processing. J Biol Chem 1997, 272(16):10435–10441. 10.1074/jbc.272.16.10435

5. Wegierski T, Lewandrowski U, Muller B, Sickmann A, Walz G: Tyrosine phosphorylation modulates the activity of TRPV4 in response to defined stimuli. J Biol Chem 2009, 284(5):2923–2933.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Protein binding hot spots prediction from sequence only by a new ensemble learning method;Amino Acids;2017-08-01

2. Pedestrian detection based on the privileged information;Neural Computing and Applications;2016-11-12

3. Prediction of Hot Spots Based on Physicochemical Features and Relative Accessible Surface Area of Amino Acid Sequence;Intelligent Computing Theories and Application;2016

4. Classify a Protein Domain Using SVM Sigmoid Kernel;Advances in Intelligent Systems and Computing;2014

5. Computational identification of epitopes in the glycoproteins of novel bunyavirus (SFTS virus) recognized by a human monoclonal antibody (MAb 4-5);Journal of Computer-Aided Molecular Design;2013-06