Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review-Reference-Cited by-同舟云学术

Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review

Published:2024-01 Issue: Volume:4 Page:
ISSN:2765-8783
Container-title:Health Data Science
language:en
Short-container-title:Health Data Sci

Author:

Zhang Meng¹²,Zheng Yongqi¹²,Maidaiti Xiagela³,Liang Baosheng⁴,Wei Yongyue¹²,Sun Feng¹²

Affiliation:

1. Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China.

2. Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China.

3. Peking University First Hospital, Beijing, China.

4. Department of Biostatistics, School of Public Health, Peking University, Beijing, China.

Abstract

Background: Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. Methods: PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. Results: A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. Conclusion: Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.

Funder

Beijing Natural Science Foundation-Haidian Original Innovation Joint Fund Frontier Project

Publisher

American Association for the Advancement of Science (AAAS)

Link

https://spj.science.org/doi/pdf/10.34133/hds.0165

Reference45 articles.

1. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors;Harrell FE;Stat Med,1996

2. Risk predictive modelling for diabetes and cardiovascular disease;Kengne AP;Crit Rev Clin Lab Sci,2014

3. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement

4. PROBAST: A tool to assess the risk of bias and applicability of prediction model studies;Wolff RF;Ann Intern Med,2019

5. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: The CHARMS checklist;Moons KGM;PLOS Med,2014