Author:
Ma Baoshan,Chai Bingjie,Dong Heng,Qi Jishuang,Wang Pengcheng,Xiong Tong,Gong Yi,Li Di,Liu Shuxin,Song Fengju
Abstract
AbstractThe potential role of DNA methylation from paracancerous tissues in cancer diagnosis has not been explored until now. In this study, we built classification models using well-known machine learning models based on DNA methylation profiles of paracancerous tissues. We evaluated our methods on nine cancer datasets collected from The Cancer Genome Atlas (TCGA) and utilized fivefold cross-validation to assess the performance of models. Additionally, we performed gene ontology (GO) enrichment analysis on the basis of the significant CpG sites selected by feature importance scores of XGBoost model, aiming to identify biological pathways involved in cancer progression. We also exploited the XGBoost algorithm to classify cancer types using DNA methylation profiles of paracancerous tissues in external validation datasets. Comparative experiments suggested that XGBoost achieved better predictive performance than the other four machine learning methods in predicting cancer stage. GO enrichment analysis revealed key pathways involved, highlighting the importance of paracancerous tissues in cancer progression. Furthermore, XGBoost model can accurately classify nine different cancers from TCGA, and the feature sets selected by XGBoost can also effectively predict seven cancer types on independent GEO datasets. This study provided new insights into cancer diagnosis from an epigenetic perspective and may facilitate the development of personalized diagnosis and treatment strategies.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Reference49 articles.
1. Mun, E. J., Babiker, H. M., Weinberg, U., Kirson, E. D. & Von Hoff, D. D. Tumor-treating fields: A fourth modality in cancer treatment. Clin. Cancer Res. 24(2), 266–275 (2018).
2. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 38, 394–424 (2021).
3. Jagga, Z. & Gupta, D. Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms. BMC Proc. 8(6), 1–7 (2014).
4. Broët, P., Kuznetsov, V. A., Bergh, J., Liu, E. T. & Miller, L. D. Identifying gene expression changes in breast cancer that distinguish early and late relapse among uncured patients. Bioinformatics 22(12), 1477–1485 (2006).
5. Rahimi, A. & Gönen, M. Discriminating early- and late-stage cancers using multiple kernel learning on gene sets. Bioinformatics 34(13), i412–i421 (2018).
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献