Cervical cancer prediction using machine learning models based on blood routine analysis

Author:

Su Jie1,Lu Hui2,Zhang RuiHuan3,Cui Na4,Chen Chao3,Si Qin4,Song Biao3

Affiliation:

1. Inner Mongolia Medical University

2. Inner Mongolia University

3. Medical Intelligent Diagnostics Big Data Research Institute

4. Peking University Cancer Hospital (Inner Mongolia Campus/Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Autonomous Region Cancer Center Gynecological oncology)

Abstract

Abstract

Background and objective: Cervical cancer is the fourth most common cancer among women globally. The key of prevention and treatment of cervical cancer is early detection, diagnosis and treatment. We aimed to develop an interpretable model to predict the risk for patients with cervical cancer based on blood routine data and used the Shapley additive interpretation (SHAP) method to explain the model and explore factors for cervical cancer. Methods In this paper, medical records of patients from 2013 to 2023 were collected for retrospective study. 2533 patients with cervical cancer were used as the case group, and 9879 patients with apparent healthy subjects were used as the control group. Using age, clinical diagnosis information and 22 blood cell analysis results, four different algorithm were used to construct cervical cancer prediction model. Results Using lasso regression and random forest method, 15 important blood routine features were finally selected from 23 features for model training. Comparatively, the XGBoost model had the highest predictive performance among four models with an area under the curve (AUC) of 0.964, whereas RF had the poorest generalization ability (AUC = 0.907). The SHAP method reveals the top 6 predictors of cervical cancer according to the importance ranking, and the average of the PDW was recognized as the most important predictor variable. Conclusion In conclusion, we select the best ML based on performance and rank the importance of features according to Shapley Additive Explanation (SHAP) values. Compared to the other 4 algorithms, the results showed that the XGB had the best prediction performance for successfully predicting cervical cancer recurrence and was adopted in the establishment of the prediction model.

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3