The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model-Reference-Cited by-同舟云学术

The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model

Published:2021-07-05 Issue: Volume:9 Page:
ISSN:2296-2565
Container-title:Frontiers in Public Health
language:
Short-container-title:Front. Public Health

Author:

Guo Chao-Yu,Yang Ying-Chen,Chen Yi-Hau

Abstract

An adequate imputation of missing data would significantly preserve the statistical power and avoid erroneous conclusions. In the era of big data, machine learning is a great tool to infer the missing values. The root means square error (RMSE) and the proportion of falsely classified entries (PFC) are two standard statistics to evaluate imputation accuracy. However, the Cox proportional hazards model using various types requires deliberate study, and the validity under different missing mechanisms is unknown. In this research, we propose supervised and unsupervised imputations and examine four machine learning-based imputation strategies. We conducted a simulation study under various scenarios with several parameters, such as sample size, missing rate, and different missing mechanisms. The results revealed the type-I errors according to different imputation techniques in the survival data. The simulation results show that the non-parametric “missForest” based on the unsupervised imputation is the only robust method without inflated type-I errors under all missing mechanisms. In contrast, other methods are not valid to test when the missing pattern is informative. Statistical analysis, which is improperly conducted, with missing data may lead to erroneous conclusions. This research provides a clear guideline for a valid survival analysis using the Cox proportional hazard model with machine learning-based imputations.

Publisher

Frontiers Media SA

Subject

Public Health, Environmental and Occupational Health

Reference20 articles.

1. Income nonresponses in the current population survey;Ono,1969

2. An overview of hot-deck procedures;Ford;Incom Data Sample Surv.,1983

3. A review of hot deck imputation for survey non-response;Andridge;Int Stat Rev.,2010

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Pseudo datasets explain artificial neural networks;International Journal of Data Science and Analytics;2024-04-10

2. Machine Learning and Health Science Research: Tutorial;Journal of Medical Internet Research;2024-01-30

3. A Machine Learning-Based Multiple Imputation Method for the Health and Aging Brain Study–Health Disparities;Informatics;2023-10-11

4. Analysis of Missing Health Care Data by Effective Adaptive DASO Based Naive Bayesian Model;Journal of Machine and Computing;2023-10-05

5. A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods;Scientific Reports;2023-06-09