Author:
Annest Amalia,Bumgarner Roger E,Raftery Adrian E,Yeung Ka Yee
Abstract
Abstract
Background
Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.
Results
We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).
Conclusion
The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference64 articles.
1. Li J, Duan Y, Ruan X: A Novel Hybrid Approach to Selecting Marker Genes for Cancer Classification Using Gene Expression Data. The 1st International Conference on Bioinformatics and Biomedical Engineering, 2007, ICBBE. 2007, 264-267.
2. Liu H, Motoda H: Feature Selection for Knowledge Discovery and Data Mining. 1998, Boston: Kluwer Academic Publishers
3. Liu H, Motoda H: Computational Methods of Feature Selection. Chapman & Hall/CRC data mining and knowledge discovery series. 2008, Boca Raton: Chapman & Hall/CRC Press
4. Nguyen D, Rocke D: Tumor classification by Partial Least Squares Using Microarray Gene Expression Data. Bioinformatics. 2002, 18: 39-50. 10.1093/bioinformatics/18.1.39.
5. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caliqiuri M, Bloomfield C, Lander E: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531.
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献