New Trends in Evidence-based Statistics: Data Imputation Problems

Author:

Kovtun N. V.ORCID,Fataliieva A.-N. Ya.ORCID

Abstract

The main reasons for omissions are: 1. Exclusion of the subject from the study due to non-compliance with study requirements; 2. The occurrence of an adverse event; 3. Missing result; 4. Lack of registration; 5. Researchers’ act of omission and / or commission.We can define the following data gap limits: 1) Less than 5% of omissions are insignificant and they do not affect the research results; 2) Data losses of 20% and more question the integrity of research results. The higher the share of the missing data, the less reliable the conclusions are, and the more difficult to prove the treatment efficiency is. Consequently, missing data is a potential source of bias when analyzing data. Exclusion of subjects can affect the compatibility of groups and subgroups, which leads to bias in the estimates.There are different ways to deal with missing data. The simplest is to exclude the subject from the calculations. But the consequences of this approach are: reduction in sample size; compromise in the extent of relevance for statistical inferences; change of a confidence interval (e.g. narrowing resulting from underestimation of variances). Hence, it is important to identify the nature of the omission when dealing with missing data which can be of missing completely at random (MCAR), missing at random (MAR) and missing not at random. This necessitates using an appropriate method of data processing with missing values: exclusion, filling, weighing and modeling. All these methods give different results with different volumes and nature of omissions.We attempted to evaluate the results of different imputation methods by using a sample with different proportions of missing data that were simulated. Thus, with 10% of the MCAR omissions, parameter estimates and p-value for two factors, resulting from the application of the first group of methods, were close to the result from complete data. Average square errors that were calculated by using the method of the absolute average, and the method of filling blank spaces with successive selection, were closer to the standard; all other methods overvalued this estimate. Coefficient of determination was almost similar to the initial data when the method of filling blank spaces with successive selection was applied. Data with 25% of missing MCAR: factor – treatment group became insignificant when the method of filling with absolute and conditional averages was applied. The lowest estimate for coefficient of determination was found when the method of filling with absolute average values was applied, and overestimation was the least when the method of filling blank spaces with successive selection was applied. The changes were minimal with other approaches. Thus, parameter estimates and p-value resulting from the application of the analysis method of available cases were closer to the result available from the regression on the complete data.Data with 50% of missing MCAR: Pre-treatment weight became insignificant when the analysis method of complete observations was applied. Factor treatment group became insignificant when the method of filling blank spaces with successive selection was applied. The most accurate estimate of pre-treatment weight variable was received from the result of the method of conditional average. But, the method of filling with absolute average can be singled out - its results were the closest to the initial data.According to the results of imputation with 10% and 50% of missing MAR data by each method, the change in parameter estimate for an intercept and two factors were minimal. It is with the application of the methods of multiple imputation that average square error and determination coefficient were the closest to the results, received from using complete data.This study identifies the weaknesses and the strengths of different methods of data imputation, and presents the effectiveness of applying the one method over the other one with different shares of missed information. Undisputedly, the result from this study established that the approach to the imputation process cannot be an “one-size-fits-all” and the imputation problem should be solved on a case-by-case basis by analysis of the existing database, taking into account not only the characteristics of the data itself and the volume of omissions, but also the expected contribution(s) from a particular study.

Publisher

National Academy of Statistics Accounting and Audit

Reference11 articles.

1. Little, R. J., D’Agostino, R., Cohen, M. L., & Dickersin, K. (2012). The Prevention and Treatment of Missing Data in Clinical Trials. The New England Journal of Medicine, Vol. 367, 14. Retrieved from http://www.nejm.org/doi/pdf/10.1056/nejmsr1203730

2. Dziura J. D., Post, L. A, Zhao, Q., Fu, Z., & Peduzzi, P. (2013). Strategies for dealing with missing data in clinical trials: from design to analysis, Yale Journal of Biology and Medicine. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3767219/

3. Schafer, J. (1999). Multiple imputation: a primer. Statistical Methods in Medical Research, 8 (1), 3–15.

4. Zloba, E., & Yatskiv, I. (2002). Statisticheskie metody vosstanovleniia propushchennykh dannyh [Statistical methods for missing data recovering]. Computer Modelling & New Technologies, Vol. 6(1), 51–61 [in Russian].

5. Kutlaliev, A. (2011). Metod mnozhestvennoho vosstanovleniia dannykh [Multiple Data Imputation Method]. Sotsiolohicheskie metody v sovremennoi issledovatelskoi praktike – Sociological methods in modern research practice, 201–-208. Retrieved from https://publications.hse.ru/mirror/pubs/share/folder/21tn35z9vl/direct/92272011 [in Russian].

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. THE APPLICATION OF PATTERN MIXTURE MODELS AND TIPPING POINT ANALYSIS IN SOCIAL RESEARCH;Bulletin of Taras Shevchenko National University of Kyiv. Economics;2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3