Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias-Reference-Cited by-同舟云学术

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

Published:2023-09-15 Issue: Volume:3 Page:
ISSN:2674-1199
Container-title:Frontiers in Epidemiology
language:
Short-container-title:Front. Epidemiol.

Author:

Curnow Elinor,Tilling Kate,Heron Jon E.,Cornish Rosie P.,Carpenter James R.

Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

Publisher

Frontiers Media SA

Reference26 articles.

1. Missing data: a statistical framework for practice;Carpenter;Biom J,2021

2. Multiple Imputation for Nonresponse in Surveys

3. A comparison of inclusive and restrictive strategies in modern missing data procedures;Collins;Psychol Methods,2001

4. Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study;Cornish;Emerg Themes Epidemiol,2017

5. The Multiple Imputation Procedure and its Justification

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing data integrity in Electronic Health Records: Review of methods for handling missing data;2024-05-13

2. Multiple imputation assuming missing at random: auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random;2023-10-17