Author:
Ben-Hur Asa,Noble William Stafford
Abstract
Abstract
The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference32 articles.
1. von Mering C, Krause R, Snel B, Cornell M, Olivier SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417: 399–403. 10.1038/nature750
2. Sprinzak E, Margalit H: Correlated sequence-signatures as markers of protein-protein interaction. Journal of Molecular Biology 2001, 311: 681–692. 10.1006/jmbi.2001.4920
3. Deng M, Mehta S, Sun F, Chen T: Inferring domain-domain interactions from protein-protein interactions. Genome Research 2002, 12(10):1540–1548. 10.1101/gr.153002
4. Gomez SM, Noble WS, Rzhetsky A: Learning to predict protein-protein interactions. Bioinformatics 2003, 19: 1875–1881. 10.1093/bioinformatics/btg352
5. Wang H, Segal E, Ben-Hur A, Koller D, Brutlag DL: Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale. In Advances in Neural Information Processing Systems 17. Edited by: Saul LK, Weiss Y, Bottou L. Cambridge, MA: MIT Press; 2005:1465–1472.
Cited by
186 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献