Protein–protein interaction site prediction using random forest proximity distance-Reference-Cited by-同舟云学术

Protein–protein interaction site prediction using random forest proximity distance

Published:2020-11-19 Issue:01 Volume:19 Page:2050042
ISSN:0219-7200
Container-title:Journal of Bioinformatics and Computational Biology
language:en
Short-container-title:J. Bioinform. Comput. Biol.

Author:

Qiu Zhijun¹²,Liu Qingjie¹

Affiliation:

1. College of Food and Bioengineering, Henan University of Science and Technology, Luoyang, P. R. China

2. Henan Engineering Research Center of Food Microbiology, Luoyang 471023, P. R. China

Abstract

A front-end method based on random forest proximity distance (PD) is used to screen the test set to improve protein–protein interaction site (PPIS) prediction. The assessment of a distance metric is done under the assumption that a distance definition of higher quality leads to higher classification. On an independent test set, the numerical analysis based on statistical inference shows that the PD has the advantage over Mahalanobis and Cosine distance. Based on the fact that the proximity distance depends on the tree composition of the random forest model, an iterative method is designed to optimize the proximity distance, which adjusts the tree composition of the random forest model by adjusting the size of the training set. Two PD metrics, 75PD and 50PD, are obtained by the iterative method. On two independent test sets, compared with the PD produced by the original training set, the values of 75PD in Matthews correlation coefficient and F1 score were higher, and the differences between them were statistically significant. All numerical experiments show that the closer the distance between the test data and the training data, the better the prediction results of the predictor. These indicate that the iterative method can optimize proximity distance definition and the distance information provided by PD can be used to indicate the reliability of prediction results.

Funder

National Natural Science Foundation of China

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science Applications,Molecular Biology,Biochemistry

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219720020500420

Reference24 articles.

1. An overview of recent advances in structural bioinformatics of protein–protein interactions and a guide to their principles

2. DrugScorePPI webserver: fast and accurate in silico alanine scanning for scoring protein–protein interactions

3. Comparing experimental and computational alanine scanning techniques for probing a prototypical protein–protein interaction

4. Algorithmic approaches to protein-protein interaction site prediction

5. Protein–protein interaction site predictions with minimum covariance determinant and Mahalanobis distance

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information;Interdisciplinary Sciences: Computational Life Sciences;2024-01-11

2. Combining deep graph convolutional networks and PRSA to enhance protein-protein interaction site prediction;2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC);2022-10-09

3. Chilling injury mechanism of hardy kiwifruit ( Actinidia arguta ) was revealed by proteome of label‐free techniques;Journal of Food Biochemistry;2021-08-13

4. Predicting S-nitrosylation proteins and sites by fusing multiple features;MATH BIOSCI ENG;2021