Author:
Chen Zhengzhang,Padmanabhan Kanchana,Rocha Andrea M,Shpanskaya Yekaterina,Mihelcic James R,Scott Kathleen,Samatova Nagiza F
Abstract
AbstractBackgroundA latent behavior of a biological cell is complex. Deriving the underlying simplicity, or the fundamental rules governing this behavior has been the Holy Grail of systems biology. Data-driven prediction of the system components and their component interplays that are responsible for the target system’s phenotype is a key and challenging step in this endeavor.ResultsThe proposed approach, which we call System Phenotype-related Interplaying Components Enumerator (Spice), iteratively enumerates statistically significant system components that are hypothesized (1) to play an important role in defining the specificity of the target system’s phenotype(s); (2) to exhibit a functionally coherent behavior, namely, act in a coordinated manner to perform the phenotype-specific function; and (3) to improve the predictive skill of the system’s phenotype(s) when used collectively in the ensemble of predictive models.Spicecan be applied to both instance-based data and network-based data. When validated,Spiceeffectively identified system components related to three target phenotypes: biohydrogen production, motility, and cancer. Manual results curation agreed with the known phenotype-related system components reported in literature. Additionally, using the identified system components as discriminatory features improved the prediction accuracy by 10% on the phenotype-classification task when compared to a number of state-of-the-art methods applied to eight benchmark microarray data sets.ConclusionWe formulate a problem—enumeration of phenotype-determining system component interplays—and propose an effective methodology (Spice) to address this problem.Spiceimproved identification of cancer-related groups of genes from various microarray data sets and detected groups of genes associated with microbial biohydrogen production and motility, many of which were reported in literature.Spicealso improved the predictive skill of the system’s phenotype determination compared to individual classifiers and/or other ensemble methods, such as bagging, boosting, random forest, nearest shrunken centroid, and random forest variable selection method.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Modeling and Simulation,Structural Biology
Reference91 articles.
1. Ash C: From simplicity to complexity. Science 2010, 329: 1125.
2. Bellman R: Adaptive Control Processes: A Guided Tour. Princeton. Princeton University Press, NJ; 1961.
3. Chen W, Schmidt M, Tian W, Samatova N: A fast, accurate algorithm for identifying functional modules through pairwise local alignment of protein interaction networks. In Proceedings of the International Conference on Bioinformatics & Computational Biology.. Las Vegas, NV, USA; 2009:816-821.
4. Chen W, Rocha A, Hendrix W, Schmidt M, Samatova N: The multiple alignment algorithm for metabolic pathways without abstraction. Proceedings of IEEE International Conference on Data Mining Workshops 669-678.
5. Koyutürk M, Kim Y, Subramaniam S, Szpankowski W, Grama A: Detecting conserved interaction patterns in biological networks. J Comput Biol 2006,13(7):1299-1322. 10.1089/cmb.2006.13.1299
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献