Early detection of nasopharyngeal carcinoma through machine‐learning‐driven prediction model in a population‐based healthcare record database

Author:

Chen Jeng‐Wen1234ORCID,Lin Shih‐Tsang13,Lin Yi‐Chun5,Wang Bo‐Sian6,Chien Yu‐Ning7ORCID,Chiou Hung‐Yi56

Affiliation:

1. Department of Otolaryngology–Head and Neck Surgery, Cardinal Tien Hospital and School of Medicine Fu Jen Catholic University New Taipei City Taiwan

2. Department of Medical Education and Research Cardinal Tien Hospital New Taipei City Taiwan

3. Department of Otolaryngology–Head and Neck Surgery National Taiwan University Hospital Taipei Taiwan

4. Department of Education and Research Cardinal Tien Junior College of Healthcare and Management New Taipei City Taiwan

5. School of Public Health Taipei Medical University Taipei Taiwan

6. Institute of Population Health Sciences, National Health Research Institutes Miaoli Taiwan

7. Department of Health and Welfare University of Taipei Taiwan

Abstract

AbstractObjectiveEarly diagnosis and treatment of nasopharyngeal carcinoma (NPC) are vital for a better prognosis. Still, because of obscure anatomical sites and insidious symptoms, nearly 80% of patients with NPC are diagnosed at a late stage. This study aimed to validate a machine learning (ML) model utilizing symptom‐related diagnoses and procedures in medical records to predict nasopharyngeal carcinoma (NPC) occurrence and reduce the prediagnostic period.Materials and MethodsData from a population‐based health insurance database (2001–2008) were analyzed, comparing adults with and without newly diagnosed NPC. Medical records from 90 to 360 days before diagnosis were examined. Five ML algorithms (Light Gradient Boosting Machine [LGB], eXtreme Gradient Boosting [XGB], Multivariate Adaptive Regression Splines [MARS], Random Forest [RF], and Logistics Regression [LG]) were evaluated for optimal early NPC detection. We further use a real‐world data of 1 million individuals randomly selected for testing the final model. Model performance was assessed using AUROC. Shapley values identified significant contributing variables.ResultsLGB showed maximum predictive power using 14 features and 90 days before diagnosis. The LGB models achieved AUROC, specificity, and sensitivity were 0.83, 0.81, and 0.64 for the test dataset, respectively. The LGB‐driven NPC predictive tool effectively differentiated patients into high‐risk and low‐risk groups (hazard ratio: 5.85; 95% CI: 4.75–7.21). The model‐layering effect is valid.ConclusionsML approaches using electronic medical records accurately predicted NPC occurrence. The risk prediction model serves as a low‐cost digital screening tool, offering rapid medical decision support to shorten prediagnostic periods. Timely referral is crucial for high‐risk patients identified by the model.

Funder

Cardinal Tien Hospital

Publisher

Wiley

Reference48 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3