Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation-Reference-Cited by-同舟云学术

Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation

Published:2021-08-30 Issue:8 Volume:9 Page:e29807
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Lee Eunsaem^ORCID,Jung Se Young^ORCID,Hwang Hyung Ju^ORCID,Jung Jaewoo^ORCID

Abstract

Background Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. Objective We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. Methods As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. Results The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. Conclusions Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference36 articles.

1. Global Cancer ObservatoryWorld Health Organization2021-04-13https://gco.iarc.fr/

2. Cancer is a Preventable Disease that Requires Major Lifestyle Changes

3. Nationwide breast cancer screening programme fully implemented in the Netherlands

4. Assessment of nationwide cancer-screening programmes

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development and Validation of a Colorectal Cancer Prediction Model: A Nationwide Cohort-Based Study;Digestive Diseases and Sciences;2024-04-25

2. Colorectal Cancer Epidemiology, Screening and Segmentation using CNNs;2023 International Conference on New Frontiers in Communication, Automation, Management and Security (ICCAMS);2023-10-27

3. Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: a comparative study;BMC Medical Informatics and Decision Making;2023-04-06

4. A Brief Review of Explainable Artificial Intelligence Reviews and Methods;Explainable Machine Learning for Multimedia Based Healthcare Applications;2023

5. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022);Computer Methods and Programs in Biomedicine;2022-11