Author:
Adnan Nahim,Liu Zhijie,Huang Tim H.M.,Ruan Jianhua
Abstract
Abstract
Background
Discovering a highly accurate and robust gene signature for the prediction of breast cancer metastasis from gene expression profiling of primary tumors is one of the most challenging tasks to reduce the number of deaths in women. Due to the limited success of gene-based features in achieving satisfactory prediction accuracy, many methodologies have been proposed in recent years to develop network-based features by integrating network information with gene expression. However, evaluation results are inconsistent to confirm the effectiveness of network-based features, because of many confounding factors involved in classification model learning process, such as data normalization, dimension reduction, and feature selection. An unbiased comparative evaluation is essential for uncovering the strength of network-based features.
Methods
In this study, we compared several types of network-based features obtained using different mathematical operators (Mean, Maximum, Minimum, Median, Variance) on geneset (i.e., a gene and its’ neighbors in the network) in protein-protein interaction network and gene co-expression network for their ability in predicting breast cancer metastasis using gene expression data from more than 10 patient cohorts.
Results
While network-based features are usually statistically more significant than gene-based feature, a consistent improvement of prediction performance using network-based features requires a substantial number of patients in the dataset. In contrary to many previous reports, no evidence was found to support the robustness of network-based features and we argue some of the robustness may be due to the inherent bias associated with node degree in the network. In addition, different types of network features seem to cover different pathways and are complementary to each other. Consequently, an ensemble classifier combining different network features was proposed and was found to significantly outperform classifiers based on gene-based feature or any single type of network-based features.
Conclusions
Network-based features and their combination show promise for improving the prediction of breast cancer metastasis but may require a large amount of training data. Robustness claim of network-based features needs to be re-examined with network node degree and other confounding factors in consideration.
Publisher
Springer Science and Business Media LLC
Subject
Genetics (clinical),Genetics
Reference27 articles.
1. Weigelt B, Peterse JL, van’t Veer LJ. Breast cancer metastasis: markers and models. Nat Rev Cancer. 2005; 5(8):591–602.
2. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA: Cancer J Clin. 2018; 68(1):7–30.
3. Breast Cancer - Metastatic: Statistics. https://www.cancer.net/cancer-types/breast-cancer-metastatic/statistics. Accessed 20 Feb 2019.
4. McGuire WL. Breast Cancer Prognostic Factors: Evaluation Guidelines. JNCI: J Natl Cancer Inst. 1991; 83(3):154–5.
5. van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415:530.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献