Author:
Shaw Crystal,Wu Yingyan,Zimmerman Scott C,Hayes-Larson Eleanor,Belin Thomas R,Power Melinda C,Glymour M Maria,Mayeda Elizabeth Rose
Abstract
Abstract
Incomplete longitudinal data are common in life-course epidemiology and may induce bias leading to incorrect inference. Multiple imputation (MI) is increasingly preferred for handling missing data, but few studies explore MI-method performance and feasibility in real-data settings. We compared 3 MI methods using real data under 9 missing-data scenarios, representing combinations of 10%, 20%, and 30% missingness and missing completely at random, at random, and not at random. Using data from Health and Retirement Study (HRS) participants, we introduced record-level missingness to a sample of participants with complete data on depressive symptoms (1998–2008), mortality (2008–2018), and relevant covariates. We then imputed missing data using 3 MI methods (normal linear regression, predictive mean matching, variable-tailored specification), and fitted Cox proportional hazards models to estimate effects of 4 operationalizations of longitudinal depressive symptoms on mortality. We compared bias in hazard ratios, root mean square error, and computation time for each method. Bias was similar across MI methods, and results were consistent across operationalizations of the longitudinal exposure variable. However, our results suggest that predictive mean matching may be an appealing strategy for imputing life-course exposure data, given consistently low root mean square error, competitive computation times, and few implementation challenges.
Funder
National Institutes of Health, National Center for Advancing Translational Sciences
National Institutes of Health, National Institute on Aging
Publisher
Oxford University Press (OUP)