Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification

Author:

Lee Katherine J12,Carlin John B123,Simpson Julie A3,Moreno-Betancur Margarita12ORCID

Affiliation:

1. Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute , Melbourne, Australia

2. Department of Paediatrics, University of Melbourne , Australia

3. Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne , Melbourne, VIC, Australia

Abstract

Abstract Researchers faced with incomplete data are encouraged to consider whether their data are ‘missing completely at random’ (MCAR), ‘missing at random’ (MAR) or ‘missing not at random’ (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defined by Rubin in the 1970s. First, when there are missing data in multiple variables, the plausibility of the MAR assumption is difficult to assess using substantive knowledge and is more stringent than is generally appreciated. Second, although MCAR and MAR are sufficient conditions for consistent estimation with specific methods, they are not necessary conditions and therefore this categorization does not directly determine the best approach for handling the missing data in an analysis. How best to handle missing data depends on the assumed causal relationships between variables and their missingness, and what these relationships imply in terms of the ‘recoverability’ of the target estimand (the population parameter that encodes the answer to the underlying research question). Recoverability is defined as whether the estimand can be consistently estimated from the patterns and associations in the observed data without needing to invoke external information on the extent to which the distribution of missing values might differ from that of observed values. In this manuscript we outline an approach for deciding which method to use to handle multivariable missing data in an analysis, using directed acyclic graphs to depict missingness assumptions and determining the implications in terms of recoverability of the target estimand.

Funder

Australian National Health and Medical Research Council

NHMRC Career Development Fellowship

NHMRC

Australian Research Council Discovery Early Career Researcher Award

Murdoch Children's Research Institute

Victorian Government's Operational Infrastructure Support

Publisher

Oxford University Press (OUP)

Subject

General Medicine,Epidemiology

Reference25 articles.

1. Inference and missing data;Rubin;Biometrika,1976

2. Missing data assumptions;Little;Annu Rev Stat Its Appl,2021

3. Diagnosing missing always at random in multivariate data;Bojinov;Biometrika,2019

4. What is meant by ‘missing at random’?;Seaman;Stat Sci,2013

5. Graphical models for processing missing data;Mohan;J Am Stat Assoc,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3