Comparison between inverse-probability weighting and multiple imputation in Cox model with missing failure subtype


Guo Fuyu1,Langworthy Benjamin2ORCID,Ogino Shuji1345,Wang Molin1678ORCID


1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA

2. Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA

3. Cancer Immunology and Cancer Epidemiology Programs, Dana-Farber Harvard Cancer Center, Boston, MA, USA

4. Program in MPE Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA,USA

5. Broad Institute of MIT and Harvard, Cambridge, MA, USA

6. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

7. Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA,USA

8. Harvard Medical School, Boston, MA, USA


Identifying and distinguishing risk factors for heterogeneous disease subtypes has been of great interest. However, missingness in disease subtypes is a common problem in those data analyses. Several methods have been proposed to deal with the missing data, including complete-case analysis, inverse-probability weighting, and multiple imputation. Although extant literature has compared these methods in missing problems, none has focused on the competing risk setting. In this paper, we discuss the assumptions required when complete-case analysis, inverse-probability weighting, and multiple imputation are used to deal with the missing failure subtype problem, focusing on how to implement these methods under various realistic scenarios in competing risk settings. Besides, we compare these three methods regarding their biases, efficiency, and robustness to model misspecifications using simulation studies. Our results show that complete-case analysis can be seriously biased when the missing completely at random assumption does not hold. Inverse-probability weighting and multiple imputation estimators are valid when we correctly specify the corresponding models for missingness and for imputation, and multiple imputation typically shows higher efficiency than inverse-probability weighting. However, in real-world studies, building imputation models for the missing subtypes can be more challenging than building missingness models. In that case, inverse-probability weighting could be preferred for its easy usage. We also propose two automated model selection procedures and demonstrate their usage in a study of the association between smoking and colorectal cancer subtypes in the Nurses’ Health Study and Health Professional Follow-Up Study.


National Institutes of Health

Dana-Farber Cancer Institute


SAGE Publications


Health Information Management,Statistics and Probability,Epidemiology







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3