EMLI-ICC: an ensemble machine learning-based integration algorithm for metastasis prediction and risk stratification in intrahepatic cholangiocarcinoma

Author:

Ruan Jian1,Xu Shuaishuai1,Chen Ruyin1,Qu Wenxin2,Li Qiong1,Ye Chanqi1,Wu Wei1,Jiang Qi1,Yan Feifei1,Shen Enhui3,Chu Qinjie3,Jia Yunlu1,Zhang Xiaochen1,Fu Wenguang4,Chen Jinzhang5,Timko Michael P6,Zhao Peng1,Fan Longjiang3,Shen Yifei7ORCID

Affiliation:

1. Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education , People's Republic of China

2. Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine , People's Republic of China

3. Institute of Bioinformatics, Zhejiang University , People's Republic of China

4. Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University , People's Republic of China

5. Department of Oncology, Nanfang Hospital, Southern medical University , People's Republic of China

6. Lewis and Clark Professor of Biology, Department of Biology, and professor of the Public Health Sciences, University of Virginia , U.S.A

7. Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, & Institute of Laboratory Medicine, Zhejiang University , People's Republic of China

Abstract

Abstract Robust strategies to identify patients at high risk for tumor metastasis, such as those frequently observed in intrahepatic cholangiocarcinoma (ICC), remain limited. While gene/protein expression profiling holds great potential as an approach to cancer diagnosis and prognosis, previously developed protocols using multiple diagnostic signatures for expression-based metastasis prediction have not been widely applied successfully because batch effects and different data types greatly decreased the predictive performance of gene/protein expression profile-based signatures in interlaboratory and data type dependent validation. To address this problem and assist in more precise diagnosis, we performed a genome-wide integrative proteome and transcriptome analysis and developed an ensemble machine learning-based integration algorithm for metastasis prediction (EMLI-Metastasis) and risk stratification (EMLI-Prognosis) in ICC. Based on massive proteome (216) and transcriptome (244) data sets, 132 feature (biomarker) genes were selected and used to train the EMLI-Metastasis algorithm. To accurately detect the metastasis of ICC patients, we developed a weighted ensemble machine learning method based on k-Top Scoring Pairs (k-TSP) method. This approach generates a metastasis classifier for each bootstrap aggregating training data set. Ten binary expression rank-based classifiers were generated for detection of metastasis separately. To further improve the accuracy of the method, the 10 binary metastasis classifiers were combined by weighted voting based on the score from the prediction results of each classifier. The prediction accuracy of the EMLI-Metastasis algorithm achieved 97.1% and 85.0% in proteome and transcriptome datasets, respectively. Among the 132 feature genes, 21 gene-pair signatures were developed to establish a metastasis-related prognosis risk-stratification model in ICC (EMLI-Prognosis). Based on EMLI-Prognosis algorithm, patients in the high-risk group had significantly dismal overall survival relative to the low-risk group in the clinical cohort (P-value < 0.05). Taken together, the EMLI-ICC algorithm provides a powerful and robust means for accurate metastasis prediction and risk stratification across proteome and transcriptome data types that is superior to currently used clinicopathological features in patients with ICC. Our developed algorithm could have profound implications not just in improved clinical care in cancer metastasis risk prediction, but also more broadly in machine-learning-based multi-cohort diagnosis method development. To make the EMLI-ICC algorithm easily accessible for clinical application, we established a web-based server for metastasis risk prediction (http://ibi.zju.edu.cn/EMLI/).

Funder

Zhejiang Provincial Natural Science Foundation

National Natural Science Foundation of China

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3