A cautionary tale on using imputation methods for inference in matched-pairs design-Reference-Cited by-同舟云学术

A cautionary tale on using imputation methods for inference in matched-pairs design

Published:2020-02-12 Issue:10 Volume:36 Page:3099-3106
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Ramosaj Burim¹,Amro Lubna¹,Pauly Markus¹

Affiliation:

1. Faculty of Statistics, Institute of Mathematical Statistics and Applications in Industry, Technical University of Dortmund, Dortmund 44227, Germany

Abstract

Abstract Motivation Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random forest have shown favorable imputation performance compared to the more traditionally used MICE procedure. However, their effect on valid statistical inference has not been analyzed so far. This article closes this gap by investigating their validity for inferring mean differences in incompletely observed pairs while opposing them to a recent approach that only works with the given observations at hand. Results Our findings indicate that machine-learning schemes for (multiply) imputing missing values may inflate type I error or result in comparably low power in small-to-moderate matched pairs, even after modifying the test statistics using Rubin’s multiple imputation rule. In addition to an extensive simulation study, an illustrative data example from a breast cancer gene study has been considered. Availability and implementation The corresponding R-code can be accessed through the authors and the gene expression data can be downloaded at www.gdac.broadinstitute.org. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

German Academic Exchange Service

Research Grants—Doctoral Programmes

German Research Foundation

DFG

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa082/32739204/btaa082.pdf

Reference38 articles.

1. Permuting incomplete paired data: a novel exact and asymptotic correct randomization test;Amro;J. Stat. Comput. Simul,2017

2. Multiplication-combination tests for incomplete paired data;Amro;Stat. Med.,2019

3. Small-sample degrees of freedom with multiple imputation;Barnard;Biometrika,1999

4. Testing equality of means of correlated variates with missing observations on both responses;Bhoj;Biometrika,1978

5. Multiple imputation for missing data via sequential regression trees;Burgette;Am. J. Epidemiol,2010

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing endangered species monitoring by lowering data entry requirements with imputation techniques as a preprocessing step for the footprint identification technology (FIT);Ecological Informatics;2024-09

2. Assessing the multivariate distributional accuracy of common imputation methods;Statistical Journal of the IAOS;2024-03-15

3. Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms;Entropy;2023-03-17

4. Preterm Prelabor Rupture of Membranes Linked to Vaginal Bacteriome of Pregnant Females in the Early Second Trimester: a Case-Cohort Design;Reproductive Sciences;2023-02-01

5. Estimating Gaussian Copulas with Missing Data with and without Expert Knowledge;Entropy;2022-12-19