Affiliation:
1. Vanderbilt University Medical Center, Nashville, TN;
2. Vanderbilt Univ Medcl Ctr, Nashville, TN;
3. Baylor College of Medicine, Houston, TX
Abstract
635 Background: Colorectal cancer (CRC) remains a leading cause of cancer-related mortality in the United States. A key therapeutic dilemma in the treatment of CRC is whether patients with stage II and stage III disease require adjuvant chemotherapy after surgical resection. Attempts to improve identification of patients at increased risk of recurrence have yielded many predictive models based on gene expression data, but none are FDA approved and none are used in standard clinical practice. To improve recurrence prediction, we utilize a machine learning approach to predict recurrence status at 3 years after diagnosis. Methods: A dataset was curated from six publically available microarray datasets, and multiple views were generated to include information from non-tumor tissue gene expression patterns, gene set structure, protein-protein interaction network structure, previously curated molecular signatures, and identified tumor suppressor/driver mutations. These views were used to train a diverse pool of base learners using 10x 10-fold cross-validation. Stacked generalization was used to train an ensemble model, also known as a meta-learner, from the predictions of these base learners. Results: The performance of microarray trained models was significantly better compared to models trained on clinical data (Paired Wilcoxon signed rank test, p = 1.49 x 10-8), demonstrating that molecular data predicts recurrence significantly better than basic clinical data. Review of the model training performances revealed that non-linear classifiers often outperform linear classifiers, and that ensemble methods can also enhance performance. We also demonstrate the feasibility of the multiple-view multiple learner (MVML) supervised learning framework to generate and integrate predictions across a diverse set of learners, with the performance of the meta-learner exceeding or matching that of the best base learners across all performance metrics. Conclusions: This work represents the first effort to use ensemble learning to predict CRC recurrence and highlights the promise of ensemble learning to improve the performance of predictive models in order to realize the goals of precision medicine.
Publisher
American Society of Clinical Oncology (ASCO)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Predictive models for colorectal cancer recurrence using multi-modal healthcare data;Proceedings of the Conference on Health, Inference, and Learning;2021-04-08
2. The Rise of Big Data in Oncology;Seminars in Oncology Nursing;2018-05