Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma

Author:

Li Meng-Xiang,Sun Xiao-Meng,Cheng Wei-Gang,Ruan Hao-Jie,Liu Ke,Chen Pan,Xu Hai-Jun,Gao She-Gan,Feng Xiao-Shan,Qi Yi-JunORCID

Abstract

Abstract Background A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. Methods Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. Results A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. Conclusion The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Subject

Cancer Research,Genetics,Oncology

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3