A Machine Learning-Based Multiple Imputation Method for the Health and Aging Brain Study–Health Disparities

Author:

Zhang Fan12ORCID,Petersen Melissa12,Johnson Leigh13,Hall James12,Palmer Raymond F.4,O’Bryant Sid E.12,

Affiliation:

1. Institute for Translational Research, University of North Texas Health Science Center, Fort Worth, TX 76107, USA

2. Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX 76107, USA

3. Department of Pharmacology and Neuroscience, University of North Texas Health Science Center, Fort Worth, TX 76107, USA

4. Department of Family and Community Medicine, University of Texas Health Science Center, San Antonio, TX 78229, USA

Abstract

The Health and Aging Brain Study–Health Disparities (HABS–HD) project seeks to understand the biological, social, and environmental factors that impact brain aging among diverse communities. A common issue for HABS–HD is missing data. It is impossible to achieve accurate machine learning (ML) if data contain missing values. Therefore, developing a new imputation methodology has become an urgent task for HABS–HD. The three missing data assumptions, (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR), necessitate distinct imputation approaches for each mechanism of missingness. Several popular imputation methods, including listwise deletion, min, mean, predictive mean matching (PMM), classification and regression trees (CART), and missForest, may result in biased outcomes and reduced statistical power when applied to downstream analyses such as testing hypotheses related to clinical variables or utilizing machine learning to predict AD or MCI. Moreover, these commonly used imputation techniques can produce unreliable estimates of missing values if they do not account for the missingness mechanisms or if there is an inconsistency between the imputation method and the missing data mechanism in HABS–HD. Therefore, we proposed a three-step workflow to handle missing data in HABS–HD: (1) missing data evaluation, (2) imputation, and (3) imputation evaluation. First, we explored the missingness in HABS–HD. Then, we developed a machine learning-based multiple imputation method (MLMI) for imputing missing values. We built four ML-based imputation models (support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB), and lasso and elastic-net regularized generalized linear model (GLMNET)) and adapted the four ML-based models to multiple imputations using the simple averaging method. Lastly, we evaluated and compared MLMI with other common methods. Our results showed that the three-step workflow worked well for handling missing values in HABS–HD and the ML-based multiple imputation method outperformed other common methods in terms of prediction performance and change in distribution and correlation. The choice of missing handling methodology has a significant impact on the accompanying statistical analyses of HABS–HD. The conceptual three-step workflow and the ML-based multiple imputation method perform well for our Alzheimer’s disease models. They can also be applied to other disease data analyses.

Funder

National Institute on Aging of the National Institutes of Health

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction,Communication

Reference28 articles.

1. (2023, February 13). Alzheimer’s Association 2022 Alzheimer’s Disease Facts and Figures. Available online: https://www.alz.org/alzheimers-dementia/facts-figures.

2. Proteomic Profiles of Neurodegeneration Among Mexican Americans and Non-Hispanic Whites in the HABS-HD Study;Zhang;J. Alzheimers Dis.,2022

3. Missing data analysis: Making it work in the real world;Graham;Annu. Rev. Psychol.,2009

4. Missing data: Our view of the state of the art;Schafer;Psychol. Methods,2002

5. Inference and Missing Data;Rubin;Biometrika,1976

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3